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GENERALIZED  HILL  CLIMBING  ALGORITHMS 
FOR  DISCRETE  OPTIMIZATION  PROBLEMS 

by 

Alan  W  Johnson 
Sheldon  H.  Jacobson,  Chair 
Industrial  and  Systems  Engineering 
(ABSTRACT) 

Generalized  hill  climbing  (GHC)  algorithms  are  introduced,  as  a  tool  to  address 
diflBcult  discrete  optimization  problems.  Particular  formulations  of  GHC  algorithms 
include  simulated  annealing  (SA),  local  search,  and  threshold  accepting  (TA),  among 
others.  A  proof  of  convergence  of  GHC  algorithms  is  presented,  that  relaxes  the  sufficient 
conditions  for  the  most  general  proof  of  convergence  for  stochastic  search  algorithms  in 
the  literature  (Anily  and  Federgruen  [1987]). 

Proofs  of  convergence  for  SA  are  based  on  the  concept  that  deteriorating  (hill 
climbing)  transitions  between  neighboring  solutions  are  accepted  by  comparing  a 
deterministic  function  of  both  the  solution  change  cost  and  a  temperature  parameter  to  a 
uniform  (0,1)  random  variable.  GHC  algorithms  represent  a  more  general  model,  whereby 
deteriorating  moves  are  accepted  according  to  a  general  random  variable. 

Computational  results  are  reported  that  illustrate  relationships  that  exist  between 
the  GHC  algorithm’s  finite-time  performance  on  three  problems,  and  the  general  random 
variable  formulations  used.  The  dissertation  concludes  with  suggestions  for  further 


research. 


CHAPTER  1:  INTRODUCTION 


1.1  Motivation 

Many  discrete  optimization  (minimization)  problems  belong  to  a  class  of  problems 
that  are  difficult  to  solve,  i.e.,  the  class  of  NP-complete  problems  (Garey  and  Johnson 
[1979,  pg  13]).  There  is  no  known  polynomial-time  algorithm  that  can  solve  any  problem 
in  this  class.  One  classical  example  is  the  traveling  salesman  problem  (Garey  and  Johnson 
[1979,  pg  4]).  Given  a  list  of  J  nodes,  the  problem  is  to  find  a  Hamiltonian  circuit  of 
minimum  cost.  (Note  that  a  solution  is  an  ordering  of  the  J  nodes,  with  total  cost  equal  to 
the  sum  of  the  costs  of  the  corresponding  {J  + 1)  arcs  connecting  the  nodes.  The  total 
number  of  solutions  is  of  order  ( J  - 1)! .) 

Since  the  NP-complete  class  of  problems  contains  many  examples  of  practical 
interest,  heuristic  methods  have  been  developed  that  efficiently  find  near-optimal  solutions. 
Sangiovanni-Vincentelli  [1991]  separates  heuristic  methods  into  two  conceptual  classes:  a 
class  that  computes  the  best  solution  constructively  starting  from  raw  data,  and  a  class 
that  iteratively  improves  upon  an  existing  solution.  Constructive  methods  tend  to  exploit 
specific  features  of  the  problem  to  be  solved  and  are  therefore  difficult  to  generalize, 
while  iterative  methods  are  more  fiexible.  Iterative  methods  all  share  the  same  basic 
structure:  starting  from  an  initial  solution,  a  sequence  of  solutions  are  generated  until  a 
termination  criterion  is  satisfied.  Iterative  algorithms  are  specified  by  the  rules  for 
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generating  and  accepting  new  solutions,  and  by  termination  criteria.  Greedy  iterative 
algorithms  select  only  those  solutions  whose  costs  are  less  than  or  equal  to  the  cost  of  the 
incumbent  solution,  and  therefore  generally  become  trapped  in  local  optima.  One 
particular  greedy  iterative  algorithm  is  local  search  (Papadimitriou  and  Steiglitz  [1982]). 
Stochastic  hill  climbing  algorithms  have  the  ability  to  probabilistically  accept  candidate 
solutions  with  higher  cost  than  that  of  the  incumbent  solution,  in  an  effort  to  escape  local 
optima. 

1.2  Research  Goals 

This  research  explores  how  generaliang  the  solution  acceptance  function  model  of 
stochastic  hill  climbing  algorithms  can  improve  their  performance  on  hard  discrete 
optimization  problems.  A  frequently  used  stochastic  hill  climbing  algorithm  for  discrete 
optimization  is  simulated  annealing  (SA)  (Eglese  [1990]).  SA  exploits  the  analogy  of 
discrete  optimization  to  the  physical  annealing  of  crystalline  solids,  in  which  a  solid  is 
cooled  very  slowly  from  some  elevated  temperature  and  thereby  allowed  to  relax  toward 
its  low  energy  states.  The  appeal  of  SA  derives  from  its  guarantee  of  asymptotic 
convergence  to  a  global  extremum. 

A  key  feature  of  stochastic  hill  climbing  algorithms  is  their  potential  to  escape  local 
optima.  For  example,  proofs  of  convergence  of  SA  are  based  on  the  concept  that 
deteriorating  (hill  climbing)  transitions  between  solutions  are  probabilistically  accepted  by 
comparing  a  deterministic  function  of  both  the  solution  change  cost  and  a  temperature 
parameter  to  a  uniform  (0,1)  random  variable.  This  research  examines  a  more  general 
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acceptance  probability  model,  titled  generalized  hill  climbing  (GHC),  where  deteriorating 
moves  are  accepted  according  to  a  general  random  variable. 

One  limitation  of  SA  is  that  traditional  SA  convergence  theory  fixes  the  random 
variable  as  an  exponential  function.  Anily  and  Federgruen  [1987]  present  an  SA 
convergence  theory  that  addresses  more  general  acceptance  probability  fiinctions,  but  then- 
theory  requires  a  restrictive  sufficient  condition  that  is  very  difficult  to  verify;  furthermore, 
they  do  not  provide  computational  results  to  address  whether  different  acceptance 
probability  functions  would  affect  SA’s  finite-time  performance  (in  terms  of  solution 
quality  versus  algorithm  execution  time).  In  essence,  no  unifying  convergence  theory  has 
been  developed  that  provides  sufficient  conditions  on  any  acceptance  probability 
distribution  function,  for  stochastic  hill  climbing  algorithms  to  achieve  asymptotic 
convergence  to  a  global  extremum. 

Another  limitation  of  SA  is  that  the  convergence  behavior  is  asymptotic.  Thus 
global  optimality  is  obtained  only  after  an  infinite  number  of  algorithm  iterations. 
Although  the  asymptotic  behavior  of  SA  has  been  extensively  studied,  it  is  the  finite-time 
behavior  that  interests  practitioners  (Romeo  and  Sangiovanni-Vincentelli  [1991],  Strenski 
and  Kirkpatrick  [1991],  and  Tovey  [1988]). 

The  contributions  fi-om  this  research  focus  on  two  areas:  first,  a  new  method  of 
proving  convergence  of  stochastic  hill  climbing  algorithms  is  presented,  that  relaxes  the 
sufficient  conditions  found  in  the  literature.  This  result  creates  a  large  body  of  convergent 
stochastic  hill  climbing  algorithms  where  only  SA  existed  previously.  Second,  tests  of  the 
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performance  of  selected  probability  distributions  for  the  general  random  variable  on 
specific  problems  are  conducted.  These  tests  empirically  study  which  probability 
distributions  enhance  the  GHC  algorithm’s  finite-time  performance  (in  terms  of  solution 
quality  versus  algorithm  execution  time)  on  these  problems. 

1.3  The  Generalized  Hill  Climbing  Algorithm 

Define  a  discrete  optimization  minimization  problem  as  a  three-tuple  (Q,5V,c)  where: 

1.  Q  is  a  finite  solution  space  composed  of  card(Ci)  elements,  where  ca/'i/(Q) 
is  the  cardinality  of  Q, 

2.  is  a  neighborhood  function  of  Q, 

3.  c:  n  ->  is  a  non-negative  objective  function. 

Generalized  Hill  Climbing  (GHC)  algorithms  (depicted  in  pseudo-code  in  Figure  1.1)  are 
initialized  with  solution  /  efl,  having  objective  function  value  c, .  The  total  number  of 


Initialization:  specify  (G,  W,c) ,  and  select  initial  solution  /  gQ 
While  stopping  criterion  not  met: 

Set  the  outer  loop  counter  ^  =  0 
While  iteration  k  ^K: 

Set  the  inner  loop  counter  m  =  0 
While  m^M  : 

Generate  j  g  5V(;)  according  to  probability  g, ^  {k) 

Calculate  the  change  in  objective  function  value  a,  ^  =  Cj  -  c, 
accept  solution  j  (i  <=  J)  if  {i,j)  >  a,  ^ 


/w  <=/w-t-l 
k  <=:k  +  l 


Figure  1.1.  The  Generalized  Hill  Climbing  algorithm. 
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outer  loop  iterations  K\  the  total  number  of  inner  loop  iterations  M;  a  nonnegative  random 
variable  such  that  i,j  eQ.y  and  k  eK;  and  a  stopping  criterion  must 

all  be  specified. 

1.4  Research  Questions 

Two  research  questions  are  investigated. 

1)  Asymptotic  Optimality:  Given  (at  iteration  k)  a  solution  /  eQ  and  a  neighbor 
j  e  9^(i) ,  where  solution  j  is  generated  with  probability  gfj(k) ,  and  the  transition  from 

i  to  j  occurs  if 

RAhi)^>^i,j 

what  are  sufficient  conditions  on  (i,j)  such  that 

\\mVr{solution  j  e  {set  of  globally  optimal  solutions  of  Q})  =  1  ?  (1.1) 

2)  Finite-Time  Performance;  Is  there  a  connection  between  selected  probability 

distribution  functions  for  R^(iJ)  and  the  finite-time  performance  (in  terms  of  solution 
quality  versus  algorithm  execution  time)  of  the  GHC  algorithm  on  a  specified  set  of 
problems^  (1.2) 
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CHAPTER  2:  LITERATURE  REVIEW 


Chapter  2  reviews  the  probabilistic  hill  climbing  algorithm  literature.  Section  2.1 
addresses  convergence  properties  and  finite-time  behavior  of  the  simulated  annealing 
algorithm.  Section  2.2  discusses  the  threshold  accepting  algorithm,  while  Section  2.3 
addresses  probabilistic  tabu  search,  the  noising  method,  and  genetic  algorithms. 

2.1  Simulated  Annealing 

2.1.1  Overview 

Simulated  annealing  (SA)  is  a  local  search  algorithm  with  the  capability  to  escape 
from  local  optima.  It  is  often  used  to  solve  nonconvex  discrete  optimization  problems. 
Four  recent  survey  articles  that  provide  a  good  overview  of  SA's  theoretical  development 
and  application  are  Eglese  [1990],  Fleischer  [1995],  Koulamas  et  al.  [1994],  and  Romeo 
and  Sangiovanni-Vincentelli  [1991].  Aarts  and  Korst  [1989]  and  Laarhoven  and  Aarts 
[1987]  devote  entire  books  to  the  subject.  Fox  [1995]  shows  that  selected  methods  of 
improving  the  finite-time  performance  of  SA  do  not  detract  from  its  asymptotic 
convergence  properties.  The  S  A  algorithm  is  depicted  in  Figure  2. 1 . 

This  section  reviews  the  basic  SA  algorithm,  its  convergence  properties,  and  its 
finite-time  behavior.  Note  that  although  a  substantial  literature  exists  for  the  application  of 
S  A  to  problems  with  continuous  variables,  this  review  focuses  only  on  discrete  problems. 
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Initialization:  specify  (Q,5V,c) ,  a  temperature  parameter 
and  select  an  initial  solution  i  €  Q 
While  stopping  criterion  not  met: 

Set  the  outer  loop  counter  k  =  Q 
While  iteration  k^K  : 

Set  the  inner  loop  counter  m  =  0 
While  m^M  : 

Generate  j  according  to  probability  gi  j(k) 

Calculate  the  change  in  objective  function  value  a,.  =  Cj  -  c,. 
If  A,  ^  <  0,  then  accept  solution  j  (i  <=  j) 

Else,  generate  a  random  number  U  ~C/(0,1) 

If  C/  <  exp(-A,.j.  /  j,  then  (/  <=  j) 

Else,  (/  <=  i) 
m<^m  +  \ 
k  <=k  +  \ 


Figure  2.1.  The  Simulated  Annealing  algorithm. 


SA  is  so  named  because  of  its  analogy  to  physical  annealing  with  solids,  in  which  a 
crystalline  solid  is  heated  and  then  allowed  to  cool  very  slowly  until  it  achieves  its  most 
regular  possible  crystal  lattice  configuration  (i.e.,  its  minimum  lattice  energy  state),  and 
thus  is  free  of  crystal  defects.  SA  establishes  the  connection  between  this  type  of 
thermodynamic  behavior  and  the  search  for  the  global  minimum  of  an  objective  function  in 
a  discrete  optimization  problem;  and  further,  it  provides  an  algorithmic  means  for 
exploiting  the  connection.  S  A  is  based  on  the  Metropolis  acceptance  criterion  (Metropolis 
et  al.  [1953])  which  models  how  a  thermodynamic  system  moves  from  its  current  solution 
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(state)  to  a  candidate  solution,  in  which  the  energy  content  is  being  minimized.  The 
probability  of  making  such  a  move  is 

rexp(-A,.  . A,  .  >0 
Pr( Accept  candidate  j  as  next  solution}=|^  ’  <0 

where  Q  is  a  finite  solution  space,  i,jeCi  are  the  current  and  candidate  solutions  of  the 
system,  respectively,  and  is  the  temperature  parameter  at  (outer  loop)  iteration  k,  such 
that 

t,.>0  for  all  k  and  lim  ty  =  0.  (2.2) 

t-vco 

Let  s  Cj  -Cj,  where  c,  and  denote  the  energies  associated  with  solutions  i  and  j, 

respectively.  The  candidate  solution  j  is  chosen  at  random  from  among  the  set  of 
neighbors  of  solution  /,  defined  by  and  becomes  the  current  solution,  based  on  the 
acceptance  probability  in  (2.1).  This  acceptance  probability  is  the  basic  element  of  the 
search  mechanism  in  SA.  If  the  probability  of  generating  a  candidate  solution  j  from  the 
neighbors  of  solution  i  is  gi  j(k) ,  where 

W  =  for  all i &Q,  k  =  1,2,-. ■  (2.3) 

yewco 

then  a  nonnegative  square  stochastic  matrix  P(k)  can  be  defined  with  transition 
probabilities 
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Wexp(-A,j  /  /*)  j  e  ^(i),  j  /• 

W  = '  0  ji  9^(i),  j  ^  i  (2.4) 

1-Z^uW  7  =  /- 

leK(i) 

Hi 

V. 

for  all  solutions  /  gQ,  and  all  ^  =  1,2,...  .  These  transition  probabilities  define  an 
inhomogeneous  Markov  chain  (Romeo  and  Sangiovanni-Vincentelli  [1991]).  Note  that 
boldface  type  indicates  matrix/vector  notation,  and  all  vectors  are  row  vectors,  unless 
otherwise  indicated. 

The  key  characteristic  of  SA  is  that  it  provides  a  means  of  escaping  fi-om  local 
optima  by  allowing  hill  climbing  moves  (i.e.,  moves  which  may  worsen  the  objective 
function  value).  As  the  temperature  parameter  4  is  decreased  to  zero,  hill  climbing 
moves  occur  less  fi-equently,  and  the  solution  distribution  associated  with  the 
inhomogeneous  Markov  chain  converges  to  a  form  in  which  all  the  probability  is 
concentrated  on  the  set  of  globally  optimal  solutions. 

2.1.2  Homogeneous  Markov  Chain  Theory 

Convergence  proofs  are  grouped  into  two  approaches:  homogeneous  and 
inhomogeneous.  The  homogeneous  Markov  chain  approach  (Aarts  and  Laarhoven 
[1985],  Faigle  and  Kern  [1991],  Granville  et  al.  [1994],  Lundy  and  Mees  [1986],  Mitra  et 
al.  [1986],  and  Rossier  et  al.  [1986])  assumes  that  each  temperature  4  is  held  constant  for 
a  sufficient  number  of  iterations  m  for  the  stochastic  matrix  Pik)  to  approach  its  stationary 
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probability  distribution  %{k) .  (Note  that  in  the  interest  of  simplifying  notation  as  much  as 
possible,  the  inner  loop  index  m  is  suppressed.  However,  the  reader  should  interpret  the 
index  k  as  the  double  index  k,m,  where  a  sequence  of  /«  =  1,2,..., M  SA  iterations  occur 
for  each  fixed  k.)  The  existence  of  stationary  distributions  for  each  temperature  is  based 
on  the  following  theorem.  (Note:  to  ensure  that  Theorem  2. 1  is  consistent  with  the  SA 
algorithm  depicted  in  Figure  2.1,  without  loss  of  generality,  let  be  a  function  only  of 
each  outer  loop  iteration  k,  and  let  the  respective  number  of  inner  loop  iterations  M  and 
outer  loop  iterations  K  each  approach  infinity). 

Theorem  2.1:  Let  Pi  j{k)  be  the  probability  of  going  from  solution  i  to  solution  j 

in  one  step  at  iteration  k,  and  let  P^pik)  be  the  probability  of  going  from  solution  i  to 
solution  j  in  m  steps.  If  a  Markov  chain  is  irreducible  and  aperiodic  with  finitely  many 
solutions,  then  Mm {k)  =  % fk)  exists  for  all  i^jeO.  and  iterations  k  . 

Furthermore,  Kj  (k)  is  the  unique  strictly  positive  solution  of 

=  Y,T^fk)P,  j(k),forall  j  sQ,  (2.5) 

and 

=  (2-6) 
i^a 

Proof  Define  Cinlar’s  [1975,  pg  153]  result  as  a  function  of  each  iteration  k,  and  the 
result  follows.  ■ 
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Key  requirements  for  the  existence  of  stationary  distributions  and  for  convergence  of  the 
sequence  of  nik)  vectors  include: 

a)  transition  matrix  irreducibility  (for  every  finite  iteration  k,  the  transition  matrix 
assigns  a  path  of  nonzero  probability  between  any  two  solutions  ij  eQ), 

b)  aperiodicity  (starting  at  solution  7  s  Q,  it  is  possible  to  return  to  j  in  any 
number  of  inner-loop  iterations  ni), 

c)  a  stationary  transition  matrix  probability  distribution  (which,  when  k  goes  to 
infinity,  is  nonzero  only  at  globally  optimal  solutions). 

Note  that  all  SA  proofs  of  convergence  in  the  literature  that  are  based  on 
homogeneous  Markov  chain  theory,  either  explicitly  or  implicitly  require  the  sufficient 
condition  of  reversibility  (also  called  detailed  balance)  (Ross  [1993,  pg  172]),  defined  as 

7t,  {k)Pi  j  (k)  =  Kj  (k)Pj^  (k) ,  /or  all  i^j  e  Q,  and  all  iterations  k.  (2.7) 

The  reversibility  condition  is  sufficient  for  a  unique  solution  to  exist  for  (2.5)  and  (2.6)  at 
each  iteration  k.  Ross  [1993,  pg  177]  shows  that  a  necessary  condition  for  reversibility  is 
multiplicativity  (i.e.,  for  any  three  solutions  h,i,j  eQ  such  that  c*  <  c,.  <  Cj ,  and  for  all 

iterations  k, 

(2-8) 

where  af,i(k,s^f)  is  the  probability  of  accepting  the  transition  fi-om  solution  h  to  solution  i 
at  iteration  k).  Reversibility  is  enforced  by  assuming  conditions  of  symmetry  on  the 
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solution  generation  probabilities  and  by  either  directly  expressing  the  acceptance 
probability  as  the  exponential  form,  or  by  requiring  the  multiplicative  condition  (2.8). 

The  homogeneous  proofs  of  convergence  in  the  literature  (implicitly  or  explicitly) 
require  condition  (2.8)  for  the  acceptance  fimction,  and  then  address  the  sufficient 
conditions  on  the  solution  generation  matrix.  For  example,  the  original  homogeneous 
proofs  of  convergence  (Aarts  and  Laarhoven  [1985]  and  Lundy  and  Mees  [1986])  require 
the  multiplicative  condition  for  the  acceptance  function,  and  then  assume  that  the  solution 
generation  function  is  symmetric  and  constant  for  all  iterations  k.  Rossier  et  al.  [1986] 
partition  the  solution  space  into  blocks  composed  of  neighboring  solutions  of  equal 
objective  function  value,  and  then  require  only  that  the  solution  generation  probabilities  be 
symmetric  between  the  blocks.  They  express  the  acceptance  fimction  as  a  ratio  of  the 
stationary  distribution  probabilities  (discussed  in  Section  2.1.3).  Faigle  and  Schrader 
[1988]  and  Faigle  and  Kern  [1991]  use  a  graph  theoretic  approach  to  relax  the  solution 
generation  function  symmetry  condition.  However,  they  require  that  the  solution 
acceptance  probability  fimction  satisfies  (2.8). 

Granville  et  al.  [1994]  propose  an  SA  procedure  for  filtering  binary  images,  where 
the  acceptance  function  is  based  on  the  probability  of  the  current  solution,  instead  of  the 
change  in  objective  function  value  as  in  basic  SA.  Their  probability  function  for  accepting 
a  candidate  solution  at  (outer  loop)  iteration  k  is  based  on  the  ratio  of  the  stationary 
probability  of  the  incumbent  solution  fi’om  iteration  ^  - 1 ,  versus  the  stationary  probability 
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of  an  initial  solution  (which  is  based  on  a  maximum  likelihood  estimate).  Their  acceptance 
probability  is 

5(i)  =  (911,(0) -!))♦<'> 

where  q  =  inf  tc,  /  sup  % ,  {q  must  also  be  estimated),  and  ^(k)  is  a  slowly  increasing 

yen 

function.  Therefore,  the  probability  of  a  solution  transition  does  not  consider  the  objective 
function  value  of  the  candidate  solution.  They  provide  a  proof  of  asymptotic  convergence 
of  this  approach,  but  note  that  their  proof  methodology  does  not  show  that  the  set  of 
globally  optimal  solutions  are  asymptotically  uniformly  distributed. 

2.1.3  Origins  of  the  Simulated  Annealing  Homogeneous  Theory 

SA  and  its  homogeneous  convergence  theory  are  based  on  the  work  of  Metropolis 
et  al.  [1953],  who  address  problems  in  equilibrium  statistical  mechanics  (Hammersley  and 
Handscomb  [1964,  pg  117]).  To  see  the  relationship,  consider  a  system  in  thermal 
equilibrium  with  its  surroundings,  in  solution  (state)  S  with  energy  F(S).  The  probability 
density  in  phase  space  of  the  point  representing  S  is  proportional  to 

exp(-<2F(5)),  (2.9) 

where  ®  =  (^0  ’  >  ^  is  Boltzmann’s  constant,  and  t  is  the  absolute  temperature  of  the 
surroundings.  Therefore  the  proportion  of  time  that  the  system  spends  in  solution  S  is 
proportional  to  (2.9)  (Hammersley  and  Handscomb  [1964]),  and  so  the  equilibrium 
probability  density  for  all  5  e  Q  is 


13 


(2.10) 


_  exp(-(BF(5)) 

j‘exp(-(8F(5))d;y’ 


The  expectation  of  any  valid  solution  function  f(S)  is  thus 


J/(^exp(-<BF(^)^ 

Jexp(-(BF(5))^iS 


(2.11) 


The  problem  is  that  for  many  solution  functions,  (2.11)  cannot  be  evaluated  analytically. 
Hammersley  and  Handscomb  [1964]  note  that  one  could  theoretically  use  naive  Monte 
Carlo  techniques  to  estimate  the  value  of  the  two  integrals  in  (2.11),  but  this  fails  in 
practice  because  the  exponential  factor  means  that  the  significant  portion  of  the  integrals 
are  concentrated  in  a  very  small  region  of  the  phase  space  Q .  The  remedy  is  to  use 
importance  sampling  (Bratley,  Fox,  and  Schrage  [1987,  chapter  2]),  by  generating 
solutions  with  probability  density  (2.10).  This  approach  would  also  seem  to  fail,  because 
of  the  integral  in  the  denominator  of  (2.10).  However,  Metropolis  et  al.  [1953]  solve  this 
problem  by  first  discretizing  the  solution  space,  such  that  the  integrals  in  (2.10)  and  (2.11) 
are  replaced  by  summations  over  discrete  solutions  y  6  Q .  They  then  construct  an 
irreducible  aperiodic  Markov  chain  with  transition  probabilities  such  that 


where 


for  all  yen, 

l€0 


(2.12) 


14 


for  all  j  eQ. 


(2.13) 


exp(-<BF(2)) 

ten 

Note  that  to  compute  the  equilibrium  distribution  tc,  the  denominator  of  (2.13)  (a 
normalizing  constant)  does  not  need  to  be  calculated.  Instead,  the  ratios  rCj  /  tc,  need 

only  be  computed  and  a  transition  matrix  P  defined  that  satisfies  (2.12).  Metropolis  et  al. 
[1953]  accomplish  this  by  defining  P  as  the  product  of  symmetric  solution  generation 
probabilities  g^  j ,  and  the  equilibrium  ratios  Ttj  /  Tt^  as 


bea. 


if  Ttj  /  TC,  <  1,  7  ^  / 
if  Tlj  /  TC,  >  1, 7  ^  /■ 

if  j  =  i 


(2.14) 


where 

Si,i  ^  Z^..y  =  1’  ^ ^  •  (2.15) 

/en 

The  use  of  stationary  probability  ratios  to  define  the  solution  acceptance  probabilities, 
combined  with  symmetric  solution  generation  probabilities,  enable  Metropolis  et  al.  [1953] 
to  use  the  reversibility  condition  (2.7)  to  show  that  (2.14)  and  (2.15)  satisfy  (2.12). 

2.1.4  Limitations  of  the  Reversibility  Condition 

Homogeneous  proofs  of  convergence  for  SA  become  more  difficult  when  the 
reversibility  condition  is  not  met.  Note  that  existence  of  a  unique  stationary  distribution 
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for  each  iteration  k  is  easily  shown  by  specifying  that  each  transition  matrix  P(k)  be 
irreducible  and  aperiodic.  On  the  other  hand,  it  becomes  very  difficult  to  derive  an  explicit 
closed-form  expression  for  each  stationary  distribution  7c(^)  that  remains  analytically 
tractable  as  the  problem’s  solution  space  becomes  large.  One  can  no  longer  use  (2.7)  to 
describe  each  stationary  distribution,  because  in  general,  the  multiplicative  condition  is  not 
met.  Instead,  one  must  directly  solve  the  system  of  equations  (2.5)  and  (2.6).  For 
example,  Davis  [1991]  attempts  to  obtain  a  closed-form  expression  for  Tt{k)  by  using 
Cramer’s  rule  and  rewriting  (2.5)  and  (2.6)  as 


:(*)(/ -P(i))  =  0 

(2.16) 

%ik)l^  = 1 , 

(2.17) 

where  boldface  type  indicates  vector/matrix  notation,  I  is  the  identity  matrix,  and  is  a 
colunrn  vector  of  ones.  Note  that  the  card{^y.card{Q)  transition  matrix  Pile) 

associated  with  (2.16)  is  of  rank  (carf/(Q)- 1)  (Cinlar  [1975,  pg  154).  Hence  by  deleting 
any  one  equation  from  (2.16),  and  substituting  (2.17),  the  result  is  the  set  of  cctrdiO) 
linearly  independent  equations 

%ik)il-  Pjk))  =  e  (2.18) 

where  the  square  matrix  (I-Pik))  is  obtmned  by  substituting  the  /*  column  of  the 
matrix  (/-  P(A:))  for  a  column  vector  of  ones.  The  vector  e  is  a  row  vector  of  zeroes, 
except  for  a  one  in  the  position.  Since  (/-  Pik))  is  of  full  rank,  then  its  determinant 
(written  as  det(/  -  Pik))),  is  nonzero.  Let  (/ -  Pik)y  be  the  same  matrix  as  il- Pik)) 
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except  that  the  elements  of  the  /*  row  of  {I-P(k))  are  replaced  by  the  vector  e. 
Therefore,  for  all  iterations  k, 


...  det(/-J»W)^ 

del(/-f(;t))  ’ 


for  all  j  eQ. 


(2.19) 


Davis  attempts  to  solve  (2.19)  for  each  y  eQ  via  a  multivariate  Taylor  series  expansion  of 
each  determinant,  but  is  not  able  to  derive  a  tractable  result. 

Anily  and  Federgruen  [1987]  use  perturbation  analysis  techniques  (see  e.g.,  Meyer 
[1980])  to  prove  convergence  of  a  particular  stochastic  hill-climbing  algorithm,  by 
bounding  the  deviations  of  the  sequence  of  stationary  distributions  of  the  particular  case 
against  the  sequence  of  known  stationary  distributions  corresponding  to  the  SA  algorithm. 
In  general,  this  convergence  proof  approach  is  only  useful  for  a  restrictive  class  of  SA 
algorithms,  because  the  transition  matrix  condition  number  grows  explosively  as  the 
number  of  iterations  k  becomes  large;  this  issue  is  addressed  in  Chapter  4. 

Overall,  the  difficulty  of  explicitly  expressing  the  stationary  distributions  for  large 
problem  solution  spaces,  combined  with  problems  of  bounding  the  transition  matrix 
condition  number  for  large  k,  suggest  that  in  general  it  is  very  difficult  to  prove 
asymptotic  convergence  of  the  SA  algorithm  by  treating  (2.5)  and  (2.6)  as  a  linear  algebra 
problem. 


2.1.5  Convergence  Rates  for  the  Homogeneous  Approach 

Lundy  and  Mees  [1986]  note  that  for  each  fixed  outer  loop  k,  convergence  to  the 
solution  equilibrium  probability  distribution  vector  %{k)  (in  terms  of  the  Euclidean 
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distance  between  P^'"\k)  and  K(k) ,  as  tm  ^  qo)  is  geometric  since  the  solution  space  is 
finite,  and  the  convergence  factor  is  given  by  the  second  largest  eigenvalue  of  the 
transition  matrix  P(k).  This  result  is  based  on  a  standard  convergence  theorem  for 
irreducible,  aperiodic  homogeneous  Markov  chains  (see  e.g.,  Cinlar  [1975,  pg  378]). 
Note  that  the  large  solution  space  precludes  practical  calculation  of  the  eigenvalues. 
Lundy  and  Mees  [1986]  conjecture  that  when  the  temperature  is  near  zero,  the  second 
largest  eigenvalue  will  be  close  to  one  for  problems  with  local  optima,  and  thus 
convergence  to  the  equilibrium  distribution  will  be  very  slow.  (Recall  that  the  dominant 
eigenvalue  for  P(k)  is  one,  with  algebraic  multiplicity  one  (Isaacson  and  Madsen  [1976,  pg 
125]).)  Lundy  and  Mees  [1986]  use  their  conjecture  to  justify  why  SA  should  be  initiated 
with  a  relatively  high  temperature.  For  an  overview  of  current  methods  for  assessing 
nonasymptotic  rates  of  convergence  for  general  homogeneous  Markov  chains,  see 
Rosenthal  [1995]. 

Overall,  the  assumption  of  stationarity  for  each  iteration  k  limits  practical 
application  of  homogeneous  theory— Romeo  and  Sangiovanni-Vincentelli  [1991]  show 
that  if  equilibrium  (for  a  Markov  chain  that  satisfies  the  reversibility  condition)  is  reached 
in  a  finite  number  of  steps,  then  it  is  achieved  in  one  step.  Thus,  Romeo  and  Sangiovanni- 
Vincentelli  [1991]  conjecture  that  there  is  essentially  no  hope  for  the  most-used  versions 
of  SA  to  reach  equilibrium  in  a  finite  number  of  iterations. 
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2.1.6  Inhomogenous  Markov  Chain  Theory 

The  second  convergence  approach  (Anily  and  Federgruen  [1987],  Borkar  [1992], 
Connors  and  Kumar  [1989],  Gidas  [1985],  Hajek  [1988],  and  Mitra  et  al.  [1986]),  is 
based  on  inhomogeneous  Markov  chain  theory.  In  this  approach,  the  Markov  chain  need 
not  reach  a  stationary  distribution  (e.g.,  the  SA  inner  loop  need  not  be  infinitely  long)  for 
each  k.  On  the  other  hand,  an  infinite  sequence  of  (outer  loop)  iterations  k  must  still  be 
examined,  with  the  condition  that  the  temperature  parameter  cools  sufficiently  slowly. 
The  proof  given  by  Mitra  et  al.  [1986]  is  based  on  satisfying  the  inhomogeneous  Markov 
chain  conditions  of  weak  and  strong  ergodicity  (Isaacson  and  Madsen  [1976],  Seneta 
[1981]).  The  proof  requires  four  conditions: 

a)  The  inhomogeneous  SA  Markov  chain  must  be  weakly  ergodic  (i.e., 
dependence  on  the  initial  solution  vanishes  in  the  limit). 

b)  An  eigenvector  7i(A:)  with  eigenvalue  one  must  exist  such  that  (2.5)  and  (2.6) 
hold  for  every  iteration  k. 

c)  The  Markov  chain  must  be  strongly  ergodic  (i.e.,  the  Markov  chain  must  be 
weakly  ergodic  and  the  sequence  of  eigenvectors  nik)  must  converge  to  a 
limiting  form),  e.g., 

XH^)-7i:(^  +  1)|1<»- 

t=o 
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d)  The  sequence  of  eigenvectors  must  converge  to  a  form  where  all  probability 
mass  is  concentrated  on  the  set  of  globally  optimal  solutions  opt  e  £1°^  c  Q , 
such  that  =  min(c,. ).  Hence, 

lim  7c(^)  =  . 

i-»<o 

(Note  that  weak  and  strong  ergodicity  are  equivalent  for  homogeneous  Markov  chain 
theory.) 

Mitra  et  al.  [1986]  satisfy  condition  a)  (weak  ergodicity)  by  first  forming  a  lower 
bound  on  the  probability  of  reaching  any  solution  fi-om  any  locally  minimal  solution,  and 
then  showing  that  this  bound  does  not  approach  zero  too  quickly.  For  example,  they 
define  the  lower  bound  for  the  SA  transition  probabilities  (2.4)  as 

(A:)  ^  w'”  exp(-/w A i  )  , 

where  m  is  the  number  of  transitions  needed  to  reach  any  solution  fi’om  any  solution  of 
non-maximal  objective  function  value,  w  >  0  is  a  lower  bound  on  the  one-step  solution 
generation  probabilities,  and  is  the  maximum  one-step  increase  in  objective  function 
value  between  any  two  solutions.  Mitra  et  al.  [1986]  show  the  Markov  chain  is  weakly 
ergodic  if 

00 

w"  exp(-WAi  /  /^_i)  =  Qo .  (2.20) 

*=*o 
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Therefore,  weak  ergodicity  is  obtained  if  the  temperature  is  reduced  sufficiently  slowly 
to  satisfy  (2.20).  In  general,  the  (infinite)  sequence  of  temperatures  (4},  k  =  l,2,... 
needs  to  satisfy 


log(^) 


(2.21) 


where  lim/t  =0,  P  is  a  problem-dependent  constant,  and  k  is  the  number  of  iterations. 

Mitra  et  al.  [1986]  show  that  conditions  b),  c),  and  d)  are  satisfied  by  using  the 
homogeneous  Markov  chain  theory  developed  for  the  transition  probabilities  (2.4),  and 
assuming  a  condition  of  symmetry  on  the  solution  generation  Sanction. 

Romeo  and  Sangiovanni-Vincentelli  [1991]  comment  that  while  the  logarithmic 
cooling  schedule  (2.21)  is  a  sufficient  condition  for  the  convergence,  there  are  only  a  few 
values  of  p  which  make  the  logarithmic  rule  also  necessary.  Furthermore,  there  exists  a 
unique  choice  of  P  which  makes  the  logarithmic  rule  both  necessary  and  sufficient  for  the 
convergence  of  SA  to  the  set  of  global  optima.  Hajek  [1988]  was  the  first  to  show  that 
the  logarithmic  cooling  schedule  (2.21)  is  both  necessary  and  sufficient,  by  developing  the 
tightest  bound  for  the  parameter  p.  He  defines  P  to  be  greater  than  or  equal  to  the  depth 
of  the  deepest  local  minimum  which  is  not  a  global  minimum,  under  a  weak  reversibility 
assumption.  Hajek  defines  a  Markov  chain  to  be  weakly  reversible  if,  for  any  pair  of 
solutions  i,j  eQ  and  for  any  nonnegative  real  number  h,  i  is  reachable  at  height  h  from  j 
if  and  only  if  j  is  reachable  at  height  h  from  /.  Note  that  Hajek  does  not  attempt  to  satisfy 
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the  conditions  of  weak  and  strong  ergodicity,  but  instead  uses  a  more  general  probabilistic 
approach  to  develop  a  lower  bound  on  the  probability  of  escaping  local,  but  not  global 
optima.  Hajek’s  necessary  and  sufiBcient  conditions  are  substantiated  by  Connors  and 
Kumar  [1989],  who  base  their  convergence  proof  on  a  concept  they  call  orders  of 
recurrence, 

5,  =  sup|  X  >  0 :  ^  exp(-x  /  )7t,  (^)  =  qo|  ,  for  all  /  e  Q . 

They  show  that  these  orders  of  recurrence  quantify  the  asymptotic  behavior  of  each 
solution’s  occupation  probability.  Their  key  result  is  that  the  SA  inhomogeneous  Markov 
chain  converges  in  a  Cesaro  sense  to  the  set  of  solutions  having  the  largest  recurrence 
orders.  Borkar  [1992]  improves  on  Connors  and  Kumar’s  [1989]  convergence  result  by 
using  a  convergence/oscillation  dichotomy  result  for  martingales.  Tsitsiklis  [1989]  uses 
bounds  and  estimates  for  singularly  perturbed,  approximately  stationary  Markov  chains  to 
develop  a  convergence  theory  that  subsumes  Hajek’s  [1988]  condition  of  weak 
reversibility.  (Note  that  Tsitsiklis  [1989]  defines  N(h)  c  Q  as  the  set  of  all  local  minima 
(in  terms  of  objective  function  value)  of  depth  h  +  \  ox  more.  Hence  P  is  the  smallest  h 
such  that  all  local  (but  not  global)  minima  have  depth  h  or  less.  Tsitsiklis  [1989] 
conjectures  that  without  some  form  of  reversibility,  there  will  exist  no  h  such  that  the 
global  optima  are  contained  in  the  set  of  local  optima.)  Note  that  Chiang  and  Chow 
[1988],  [1994],  Borkar  [1992],  Connors  and  Kumar  [1989],  Hajek  [1988],  and  Mitra  et 
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al.  [1986]  all  require  (either  explicitly  or  implicitly)  the  multiplicative  condition  (2.8)  for 
their  proofs  of  convergence. 

Anily  and  Federgruen  [1987]  present  a  proof  of  convergence  of  SA  with  more 
general  acceptance  probability  functions.  Using  inhomogeneous  Markov  chain  theory, 
they  prove  convergence  under  the  following  necessary  and  sufficient  conditions: 

a)  The  acceptance  probability  function  must,  for  any  iteration  k,  allow  any  hill 
climbing  transition  to  occur  with  positive  probability. 

b)  The  acceptance  probability  fiinction  must  be  bounded  and  asymptotically 
monotone,  with  limit  zero  for  hill  climbing  solution  transitions. 

c)  In  the  limit,  the  stationary  probability  distribution  must  have  zero  probability 
mass  for  every  non-globally  optimal  solution. 

d)  The  probability  of  escaping  from  any  locally  (but  not  globally)  optimal  solution 
must  not  approach  zero  too  quickly. 

Anily  and  Federgruen  [1987]  use  condition  c)  to  relax  the  acceptance  function 
multiplicative  condition  (2.8).  However,  in  practice  condition  c)  would  be  very  difficult  to 
check  without  assuming  that  (2.8)  holds,  for  the  reasons  discussed  in  Section  2.1.4. 
Condition  d)  provides  the  necessary  condition  for  the  rate  that  the  probability  of  hill 
climbing  transitions  approaches  zero.  Condition  d)  is  expressed  quantitatively  as  follows: 
let  be  specified  by  (2.2),  and  define  the  minimum  one-step  acceptance  probability  as 
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a(h)=  mm  a, 

ted,  ■' 

>eW(<) 

Define  the  set  of  local  optima  LcCl  such  that  i  <  Cj  for  all  7  e  (5V(/)  n  O) , 

and  let 

a(/t)=  max 

let,  ' 

j&N-(i)\L 

Finally,  let  any  solution  7  eQ  be  reachable  fi’om  any  solution  /  eO  in  ^  transitions  or 
less.  Then  if  (non-globally)  locally  optimal  solutions  exist,  and 

i;(a(<.))'=«.  (2.22) 

t=i 

and  conditions  a),  b),  and  c)  hold,  then  the  SA  algorithm  will  asymptotically  converge  to 
the  set  of  global  optima,  with  probability  one.  However,  if  (non-globally)  locally  optimal 
solutions  exist  and 

|;a((,)<=0,  (2.23) 

then  the  probability  of  each  solution  is  asymptotically  dependent  upon  the  initial  solution, 
and  therefore  the  SA  algorithm  will  not  always  converge  to  the  set  of  global  optima  with 
probability  one. 

The  inhomogeneous  proof  concept  is  stronger  than  the  homogeneous  approach  in 
that  it  provides  necessary  conditions  for  the  rate  of  convergence,  but  its  asymptotic  nature 
suggests  that  practical  implementation  is  infeasible.  Romeo  and  Sangiovanni-Vincentelli 
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[1991]  note  that  "there  is  no  reason  to  believe  that  truncating  the  logarithmic  temperature 
sequence  would  yield  a  good  configuration,  since  the  tail  of  the  sequence  is  the  essential 
ingredient  in  the  proof."  In  addition,  the  logarithmic  cooling  schedule  dictates  a  very  slow 
rate  of  convergence.  Hence  most  recent  work  has  focused  on  methods  of  improving  S  A's 
finite-time  behavior  and  modifying  or  blending  the  algorithm  with  other  search  methods 
such  as  genetic  algorithms  (Liepins  and  Hilliard  [1989]),  tabu  search  (Glover  [1994]),  or 
both  (Fox  [1993]). 

2.1.7  Finite-Time  Behavior 

Mitra  et  al.  [1986]  were  the  first  to  present  a  bound  on  the  distance  of  the  actual 
solution  probability  distribution  from  the  optimal  distribution  after  a  finite  number  of 
iterations.  They  found  that  for  a  large  number  of  iterations  k,  the  Lj-norm  of  the 
difference  of  the  current  probability  distribution  from  the  optimal  distribution  is 
0(i/;j.min(M)),  where  6  and  are  values  characteristic  of  the  problem  instance.  The 

values  b  and  d  are  themselves  functions  of  several  parameters  that  describe  the  solution 
space  topology,  including  the  connectivity  of  the  graph  of  the  Markov  chain,  the  maximum 
one-step  hill  climb  between  neighboring  solutions,  the  cooling  schedule,  and  on  other 
parameters  that  are  themselves  functions  of  both  solution  objective  function  values  and 
connectivity  between  solutions. 

Implementation  Issues:  Implementation  of  SA  follows  two  paths— that  of 
specifying  problem-specific  choices  (neighborhood,  objective  function,  and  constraints), 
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and  that  of  specifying  generic  choices  (generation  and  acceptance  probability  functions, 
and  the  cooling  schedule)  (Eglese  [1990]).  The  principal  shortcoming  of  SA  is  that  it 
often  requires  extensive  computer  time.  Implementation  work  generally  strives  to  retain 
SA's  asymptotic  convergence  character,  but  at  reduced  computer  run-time.  The  methods 
discussed  here  are  mostly  heuristic. 

Problem-Specific  Choices: 

Neighborhoods:  A  key  problem-specific  choice  concerns  the  neighborhood 
definition.  SA's  efficiency  appears  to  be  influenced  by  the  neighborhood  structure  used 
(Moscato  [1993]).  The  choice  of  neighborhood  serves  to  enforce  a  topology—Eglese 
[1990]  reports  that  "a  neighborhood  structure  which  imposes  a  'smooth'  topology  where 
the  local  minima  are  shallow  is  preferred  to  a  'bumpy'  topology  where  there  are  many  deep 
local  minima."  Solla,  Sorkin  and  White  [1986]  report  similar  conclusions,  as  does 
Fleischer  and  Jacobson  [1996],  This  makes  intuitive  sense,  and  it  supports  Hajek's  results, 
which  show  that  asymptotic  convergence  to  the  set  of  global  optima  depends  on  the  depth 
of  the  local  minima. 

Another  factor  is  neighborhood  size.  No  theoretical  results  are  available,  other 
than  the  necessity  of  reachability  (in  a  finite  number  of  steps)  from  any  solution  to  any 
other  solution.  Cheh  et  al.  [1991]  report  that  small  neighborhoods  are  best,  while  Ogbu 
and  Smith  [1990]  provide  evidence  that  larger  neighborhoods  perform  better  than  smaller 
neighborhoods.  Goldstein  and  Waterman  [1988]  conjecture  that  if  the  neighborhood  size 
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is  small  compared  to  the  total  solution  space  cardinality,  then  the  Markov  chain  cannot 
move  around  the  solution  space  fast  enough  to  find  the  minimum  in  a  reasonable  time.  On 
the  other  hand,  a  very  large  neighborhood  has  the  algorithm  merely  sampling  randomly 
fi'om  a  large  portion  of  the  solution  space,  and  thus  unable  to  focus  on  specific  areas  of  the 
solution  space.  It  is  reasonable  to  believe  that  neighborhood  size  is  heavily  problem- 
specific,  such  that  problems  whose  topology  smoothness  is  relatively  insensitive  to 
different  neighborhood  definitions,  may  benefit  from  larger  neighborhood  sizes. 

Fleischer  [1994]  and  Fleischer  and  Jacobson  [1996]  use  information  theoretic 
concepts  to  show  that  the  neighborhood  structure  can  affect  the  information  rate  or  total 
uncertainty  associated  with  SA.  Fleischer  [1994]  shows  that  SA  tends  to  perform  better 
as  the  entropy  level  of  the  associated  Markov  chain  increases,  and  thus  conjectures  that  an 
entropy  measure  could  be  useful  for  predicting  when  SA  would  perform  well  on  a  given 
problem.  However,  efficient  ways  of  estimating  the  entropy  are  needed  to  make  the 
concept  practical. 

A  final  issue  on  neighborhood  definition  addresses  the  solution  space  itself 
Chardaire  et  al.  [1995]  propose  a  method  for  addressing  0-1  optimization  problems,  in 
which  the  solution  space  is  progressively  reduced  by  fixing  the  value  of  strongly  persistent 
variables  (which  have  the  same  value  in  all  optimal  solutions).  They  isolate  the  persistent 
variables  during  SA’s  execution  by  periodically  estimating  the  expectation  of  the  random 
variable  (a  vector  of  binary  elements)  that  describes  the  current  solution,  and  fixing  the 
value  of  those  elements  in  the  random  variable  that  meet  threshold  criteria. 
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Objective  Functions:  Another  problem-specific  choice  involves  the  objective 
function  specification.  Stem  [1992]  recommends  a  heuristic  temperature-dependent 
penalty  function  as  a  substitute  for  the  actual  objective  fimction  for  problems  where  low 
cost  solutions  have  neighbors  of  much  higher  cost,  or  in  cases  of  degeneracy  (i.e.,  large 
neighborhoods  of  solutions  of  equal,  but  high  costs).  The  original  objective  function 
surfaces,  as  the  penalty  and  the  temperature  are  gradually  reduced  to  zero.  One  speed-up 
technique  is  to  evaluate  only  the  difference  in  objective  functions,  a^j,  instead  of 
calculating  both  c,  and  c^.  Tovey  [1988]  suggests  several  methods  of  approximating 
by  using  surrogate  functions  (faster  to  evaluate  than  a,j,  but  not  as  accurate) 
probabilistically  for  cases  when  evaluation  of  a,j  is  expensive.  He  calls  this  technique  the 
surrogate  Junction  swindle. 

Straub  et  al.  [1995]  improve  SA’s  performance  on  problems  in  chemical  physics  by 
using  the  classical  density  distribution  instead  of  the  molecular  dynamics  of  single  point 
particles  to  describe  the  potential  energy  landscape.  Ma  and  Straub  [1994]  report  that  this 
has  the  effect  of  smoothing  the  energy  landscape  by  reducing  the  number  and  depth  of 
local  minima. 

Yan  and  Mukai  [1992]  consider  the  case  when  a  closed-form  formula  for  the 
objective  function  is  not  available.  They  use  a  probabilistic  simulation  to  generate  a 
sample  objective  fimction  value  for  an  input  solution,  and  then  accept  the  solution  if  the 
sample  objective  function  value  falls  within  a  predetermined  bound.  They  provide  a  proof 
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of  asymptotic  convergence  by  extrapolating  the  convergence  proofs  for  SA,  and  analyze 
the  rate  of  convergence. 

SA  Generic  Choices: 

Generation  Probability  Functions:  Generation  probability  functions  are  usually 
chosen  as  a  uniform  distribution  with  probabilities  proportional  to  the  size  of  the 
neighborhood.  The  generation  probability  function  is  usually  not  temperature-dependent. 
Fox  [1993]  suggests  that  instead  of  blindly  generating  neighbors  uniformly,  adopt  an 
intelligent  generation  mechanism  that  modifies  the  neighborhood  and  its  probability 
distribution  to  accommodate  search  intensification  or  diversification,  in  the  spirit  of  the 
tabu  search  heuristic.  Fox  notes  that  SA  convergence  theory  does  not  preclude  this  idea. 
Tovey  [1988]  suggests  an  approach  with  a  similar  efFect~he  calls  it  the  neighborhood 
prejudice  swindle. 

Acceptance  Probability  Functions:  The  literature  shows  considerable  experimen¬ 
tation  with  acceptance  probability  functions  for  hill  climbing  transitions.  The  most  popular 
is  the  exponential  form  (2.1). 

Ogbu  and  Smith  [1990]  consider  replacing  the  basic  SA  acceptance  function 
a,  ^  (A:,A,  y)  with  a  geometrically  decreasing  form  that  is  independent  of  the  change  in 

objective  function  value.  They  adopt  a  probabilistic-exhaustive  heuristic  technique  in 
which  randomly  chosen  neighbors  of  a  solution  are  examined  and  all  solutions  that  are 
accepted  are  noted,  but  only  the  last  solution  accepted  becomes  the  new  incumbent.  Their 
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hope  is  that  this  scheme  will  explore  a  more  broad  area  of  the  solution  space  of  a  problem. 
Their  acceptance  probability  function  is  defined  for  all  solutions  i,j  eQ  and  for  k  = 
l,2,-,^as 


a(l)x*-’ 

1 


if  Cj  >  c, 
otherwise 


where  a(l)  is  the  initial  acceptance  probability  value,  x  (<1)  is  the  reducing  factor,  and  K  is 
the  number  of  stages  (equivalent  to  a  temperature  cooling  schedule).  They  experimented 
with  this  method  (and  a  neighborhood  of  large  cardinality)  on  a  permutation  flowshop 
problem,  and  report  that  their  approach  found  comparable  solutions  to  the  basic  SA 
algorithm  in  one-third  the  time. 

Cooling  Schedules:  An  S  A  cooling  schedule  is  defined  by  an  initial  temperature,  a 
schedule  for  reducing  temperature,  and  a  stopping  criterion.  Romeo  and  Sangiovanni- 
Vincentelli  [1991]  note  that  an  effective  schedule  is  essential  to  reducing  the  amount  of 
time  required  by  the  algorithm.  Therefore  much  of  the  literature  (Cardoso  et  al.  [1994], 
Fox  and  Heine  [1993])  is  devoted  to  this  topic. 

Homogeneous  SA  convergence  theory  is  often  used  to  design  good  cooling 
schedules:  start  with  an  initial  temperature  for  which  a  good  approximation  of  the 
stationary  distribution  7c(/o)  is  quickly  reached.  Reduce  by  an  amount  b{t)  small 
enough  such  that  7t(/o)  is  a  good  starting  point  to  approximate  -6(0).  Fbc  the 
temperature  at  a  constant  value  during  the  iterations  needed  for  the  solution  distribution  to 
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approximate  7c(^o -5(0)  •  Repeat  the  above  process  of  cooling  and  iterating  until  no 
further  improvement  seems  possible  (Romeo  and  Sangiovanni-Vincentelli  [1991]). 

Cooling  schedules  are  grouped  into  two  classes:  static  schedules,  which  must  be 
completely  specified  before  the  algorithm  begins;  and  adaptive  schedules,  which  adjust  the 
temperature's  rate  of  decrease  from  information  obtained  during  the  algorithm's  execution. 
Cooling  schedules  are  almost  always  heuristic;  they  seek  to  balance  moderate  execution 
time  with  SA's  dependence  on  asymptotic  behavior. 

Strenski  and  Kirkpatrick  [1991]  present  an  exact  (non-heuristic)  characterization 
of  finite-length  annealing  schedules.  They  consider  extremely  small  problems  that 
represent  features  (local  optima  and  smooth/hilly  topologies),  and  solve  for  solution 
probabilities  after  a  finite  number  of  iterations.  They  find  that  optimal  cooling  schedules 
are  not  monotone  decreasing  in  temperature.  Another  result  is  that,  for  their  test  problem 
(a  white  noise  surface),  geometric  and  linear  annealing  schedules  perform  better  than 
inverse  logarithmic  schedules,  when  sufiBcient  computing  effort  is  allowed.  Their 
experiments  do  not  show  measurable  performance  differences  between  linear  and 
geometric  schedules.  They  also  find  that  geometric  schedules  don't  suffer  seriously  when 
initial  temperatures  are  set  too  high.  Their  technique  does  not  allow  simulating  adaptive 
schedules,  but  they  do  check  some  of  the  underlying  assumptions.  Their  results  show  that 
the  even  the  most  robust  adaptive  schedule  "produces  annealing  trajectories  which  are 
never  in  equilibrium”  (Strenski  and  Kirkpatrick  [1991]).  However,  they  also  conclude  that 
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the  transition  acceptance  rate  is  not  sensitive  to  the  degree  of  closeness  to  an  equilibrium 
distribution. 

Christoph  and  Hoffinann  [1993]  also  attempt  to  characterize  optimal  annealing 
schedules.  They  derive  a  relationship  between  a  finite  sequence  of  optimal  temperature 
values  and  the  number  of  iterations  at  each  respective  temperature  for  several  small  test 
problems  to  reach  optimality  (minimal  mean  final  energy).  They  find  that  this  scaling 
behavior  is  of  the  form 

(2.24) 

where  a  and  b  are  scaling  coefficients,  x„  is  referred  to  as  temperature,  v  is  the 

number  of  iterations  at  temperature  x„,  and  « is  the  number  of  times  the  temperature  is 
reduced.  Their  approach  is  to  solve  for  the  coefficients  a  and  b  based  on  known 
temperature  and  iteration  parameter  values  for  an  optimal  schedule  for  a  few  total 
annealing  steps,  and  then  use  (2.24)  to  predict  the  optimal  annealing  schedule  for 
intermediate  times.  They  make  no  suggestions  on  how  to  efficiently  solve  for  the 
necessary  optimal  schedules  for  a  (typically  large)  problem  instance. 

Romeo  and  Sangiovanni-Vincentelli  [1991]  conclude  that  the  theoretical  results 
obtained  thus  far  have  not  been  able  to  explmn  why  SA  is  so  successful  even  when  wild 
static  annealing  schedule  heuristics  are  used.  They  conjecture  that  the  neighborhood  and 
the  corresponding  topology  of  the  objective  function  are  responsible  for  the  behavior  of 
the  algorithm. 


32 


2.2  Threshold  Accepting 

Questioning  the  very  need  for  a  randomized  acceptance  function,  Dueck  and 
Scheuer  [1990],  and  independently,  Moscato  and  Fontanari  [1990]  propose  the  concept  of 
threshold  accepting  (TA),  where  the  acceptance  function  is  defined  as 


if  Qk 

otherwise 


where  is  the  threshold  value  at  iteration  k.  is  usually  defined  as  a  deterministic, 
nonincreasing  step  function.  Dueck  and  Scheuer  [1990]  report  dramatic  improvements  in 
traveling  salesman  problem  solution  quality  and  algorithm  run-time  over  basic  SA. 
Moscato  and  Fontanari  [1990]  report  more  conservative  results-they  conjecture  that  SA's 
probabilistic  acceptance  function  does  not  play  a  major  role  in  the  search  for  near-optimal 
solutions. 


Althofer  and  Koschnick  [1991]  develop  a  convergence  theory  for  threshold 
accepting  based  on  the  concept  that  SA  belongs  to  the  convex  hull  of  TA.  Their  idea  is 
that  (for  a  finite  threshold  sequence)  there  can  exist  only  finitely  many  TA  transition 
matrices;  but  SA  can  have  infinitely  many  transition  matrices  because  of  the  real-valued 
nature  of  the  temperature  at  each  iteration.  However,  every  SA  transition  matrix  for  a 
given  problem  can  be  represented  as  a  convex  combination  of  the  finitely  many  TA 
transition  matrices.  Althofer  and  Koschnick  [1991]  are  unable  to  prove  that  TA  will 
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asymptotically  reach  a  global  minimum,  but  they  do  prove  the  existence  of  threshold 
schedules  that  provide  convergence  to  within  an  e-neighborhood  of  the  optimal  solutions. 

Hu  et  al.  [1995]  modify  TA  to  include  a  non-monotonic,  self-tuning  threshold 
schedule  in  the  hope  of  improving  finite-time  performance.  They  allow  the  threshold 
Q^to  change  dynamically  up  or  down,  based  on  the  perceived  likelihood  of  being  near  a 
local  minimum.  This  is  accomplished  by  using  a  principle  they  call  dwindling  expectation- 
-when  the  algorithm  fails  to  move  to  neighboring  solutions,  the  threshold  is  gradually 
increased,  in  the  hope  of  eventually  escaping  a  local  optimum.  Conversely,  when  solution 
transitions  are  successful,  the  threshold  is  reduced,  in  order  to  explore  local  optima.  Their 
experimental  results  on  two  traveling  salesman  problems  showed  that  their  algorithm 
outperformed  TA  in  terms  of  finding  good  solutions  earlier  in  the  optimization  process. 

TA’s  greatest  values  are  its  ease  of  implementation  and  its  generally  faster 
execution  time  than  SA,  due  to  the  reduced  computational  effort  fi'om  avoiding 
acceptance  probability  calculations  and  generation  of  random  numbers  (Moscato  and 
Fontanari  [1990]).  Compared  to  SA,  relatively  few  TA  applications  are  reported  in  the 
literature  (Lin  et  al.  [1995],  Scheermesser  and  Bryngdahl  [1995],  and  Nissen  and  Paul 
[1995]). 
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2.3  Other  Stochastic  Approaches 


2.3.1  Probabilistic  Tabu  Search 

Tabu  search  (Glover  [1994])  is  a  general  frameAvork  for  a  variety  of  iterative  local 
search  strategies  for  discrete  optimization.  Tabu  search  (TS)  uses  the  concept  of  memory 
by  controlling  the  algorithm’s  execution  via  a  dynamic  list  of  forbidden  moves.  This 
allows  the  TS  algorithm  to  intensify  or  diversify  its  search  of  a  given  (Q,5V,c)  problem’s 

solution  space  as  necessary,  in  an  effort  to  avoid  entrapment  in  local  optima.  Note  that  a 
criticism  of  SA  is  that  it  is  completely  memoryless,  i.e.,  SA  disregards  all  historical 
information  gathered  during  the  algorithm’s  execution.  On  the  other  hand,  no  proofs  of 
convergence  exist  in  the  literature,  for  the  general  TS  algorithm  proposed  by  Glover. 
Therefore,  Faigle  and  Kern  [1992]  propose  a  particular  TS  algorithm  they  call 
probabilistic  tabu  search,  which  attempts  to  capitalize  on  both  the  asymptotic  optimality 
of  SA  and  the  memory  feature  of  TS.  In  probabilistic  TS,  the  probabilities  of  generating 
and  accepting  each  candidate  solution  are  set  as  functions  of  both  a  temperature  parameter 
(as  in  SA)  and  information  gained  in  previous  iterations.  Faigle  and  Kern  are  then  able  to 
prove  asymptotic  convergence  of  their  particular  TS  algorithm,  by  using  methods 
developed  for  SA  (Faigle  and  Kern  [1991]). 

2.3.2  Random  Noise 

Charon  and  Hudry  [1993]  advocate  a  simple  descent  algorithm  they  call  the 
Noising  Method.  The  algorithm  first  perturbs  the  solution  space  by  adding  random  noise 
to  the  problem's  objective  function  values.  The  noise  gradually  reduced  to  zero  during  the 
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algorithm’s  execution,  allowing  the  original  problem  structure  to  reappear.  Charon  and 
Hudry  provide  computational  results,  but  do  not  prove  that  the  algorithm  will 
asymptotically  converge  to  the  set  of  globally  optimal  solutions. 

Storer  et  al.  [1992]  propose  an  optimization  strategy  for  sequencing  problems,  by 
integrating  fast,  problem-specific  heuristics  with  local  search.  Their  key  contribution  is  to 
base  the  definition  of  the  search  neighborhood  on  a  heuristic  problem  pair  (h,  p),  where  h 
is  a  fast,  known,  problem-specific  heuristic  and  p  represents  the  problem  data.  By 
perturbing  the  heuristic,  the  problem,  or  both,  a  neighborhood  of  solutions  is  developed. 
This  neighborhood  then  forms  the  basis  for  local  search.  Their  hope  is  that  the 
perturbations  will  cluster  good  solutions  close  together,  thus  making  it  easier  to  perform 
local  search. 

2.3.3  Genetic  Algorithms 

Genetic  algorithms  (Liepins  and  Hilliard  [1989])  emulate  the  evolutionary  behavior 
of  biological  systems.  They  generate  a  sequence  of  populations  of  candidate  solutions  to 
the  underlying  optimization  problem  by  using  a  set  of  genetically  inspired  stochastic 
solution  transition  operators  to  transform  each  population  of  candidate  solutions  into  a 
descendent  population.  The  three  most  popular  transition  operators  are  reproduction, 
cross-over,  and  mutation  (Davis  [1991]).  Davis  [1991]  and  Rudolph  [1994]  attempt  to 
use  homogeneous  finite  Markov  chain  techniques  to  prove  convergence  of  genetic 
algorithms,  but  are  unable  to  develop  a  theory  comparable  in  scope  to  that  of  S  A. 
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CHAPTER  3:  GENERALIZED  HILL  CLIMBING 
CONVERGENCE  BASED  ON 
REVERSIBLE  MARKOV  CHAIN  THEORY 


Chapter  3  provides  a  proof  of  convergence,  and  examples,  for  a  class  of  GHC  al¬ 
gorithms.  When  the  objective  function  value  of  a  globally  optimal  solution  is  known  (and 
the  goal  is  to  identify  an  associated  optimal  solution),  then  the  solution  acceptance  prob¬ 
ability  for  all  i,j  €  Q  can  be  expressed  as  a  ratio  of  acceptance  probabilities  involving  the 


globally  optimal  solution.  This  enables  the  proof  of  convergence  to  use  reversible  homo¬ 
geneous  Markov  chain  theory. 

3.1  Generalized  Hill  Climbing  Class  Description 

Consider  a  class  of  GHC  algorithms  where,  for  all  i,j  g  Q  and  opt  g  €1°’* ,  the  solu¬ 
tion  acceptance  probabilities  are  defined  as 


mim 


Vx[R,{opt,^)>^,^^) 


(3.1) 


where  R^(opt,i)  and  R^(opt,j)  are  random  variables.  The  elements  of  the  transition 
matrix  P(k)  are  defined  as 


1-  ZPuW 


l€9r{i) 

Hi 


for  all  i  eQjs  j  ^  i 

j  =  i 


[0  otherwise, 

and  the  solution  generation  probabilities  gi  j(k)  satisfy 


(3.2) 
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(3.3) 


Note  that  from  (3.1),  this  class  of  GHC  algorithms  requires  that  the  globally  optimal  ob¬ 
jective  function  value,  ,  be  known. 

3.2  Proof  of  Convergence 

Theorem  3.1  provides  sufficient  conditions  on  this  class  of  GHC  algorithms  for  the 
existence  of  a  unique  stationary  distribution  for  each  iteration  k.  Theorem  3.2  provides 
sufficient  conditions  for  the  sequence  of  stationary  distributions  Tz{k)  to  converge  to  the 
set  of  globally  optimal  solutions. 

Theorem  3.1:  Let  (Q,W,c)  denote  cm  instance  of  a  discrete  optimization  prob¬ 
lem.  Let  each  GHC  transition  probability  Pfjik)  be  defined  by  (3.2),  where  the  accep¬ 
tance  probabilities  are  defined  by  (3.1).  Let  the  generation  probabilities  gf  j(k)  satisfy 
(3.3)  and  the  conditions 

(a)  for  all  ij  e  Q  and  all  iterations  k,  there  exists  an  integer  d>\  and  a  corre¬ 
sponding  sequence  of  solutions  6^2,  with  4  =/,4  =  j’ 

n  =  0X...,d-\, 

(b)  for  all  iJ  e  Q,  g  fik)  =  gj^{k),  and  g  fik)  is  independent  of  k 
(i.e.,  g  fik)  =  g^fik)  =  g,j  =  g^j). 

Moreover,  let  the  acceptance  probabilities  satisfy 

(c)  Pr(i?;t(^>i)  -  ^ij)  >  0  for  all  iJ  e  Q  and  all  iterations  k. 
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Then  the  stationary  distribution  for  all  i  g  Cl,  and  for  each  outer  loop  iteration  k,  is 

(«)  =  ^  /- - — 

nefl 

Proof :  The  irreducibility  and  aperiodicity  conditions  of  Theorem  2.1  are  first  shown  to  be 
satisfied.  Then  (3.1),  (3.2),  and  (3.4)  are  shown  to  satisfy  the  reversibility  condition  (2.7). 

Irreducibility.  From  condition  (a),  the  matrix  corresponding  to  the  solution  gen¬ 
eration  probabilities  must  assign  a  path  of  positive  probability  between  any  two  solutions 
ij  e  Q .  Furthermore,  condition  (c)  requires  that  the  acceptance  probability  for  hill 
climbing  moves  between  all  solutions  be  strictly  positive  for  all  finite  iterations  k.  There¬ 
fore,  conditions  (a)  and  (c)  together  imply  that  all  solutions  communicate,  which  is  an 
equivalent  definition  of  irreducibility  (Ross  [1993,  pg.  144]). 

Aperiodicity  (From  Aarts  and  Korst  [1989,  pg.  39]):  Let  i^j  eQ  with  cj  <Cj  and 

gij  >0.  By  condition  (a),  such  a  pmr  always  exists  provided  Q  ^  Cf^^ .  Without  loss  of 
generality,  assume  that  for  at  least  one  solution  n  eiAr(/),  Pr(i?^(/,w)  >  a,  „)  <  1  (without 

this  assumption,  then  it  is  possible  that  for  some  iteration  k,  all  hill  climbing  transitions 
would  be  accepted  with  probability  one,  which  represents  pure  Monte  Carlo  search 
(Hammersley  and  Handscomb  [1964]).  Note  that  the  opposite  extreme  would  occur  for 
iterations  k  if  no  hill  climbing  transitions  are  accepted,  which  represents  local  search 
(Papadimitriou  and  Steiglitz  [1982]);  hill  climbing  search  algorithms  are  designed  to  op¬ 
erate  between  these  two  extremes).  Therefore 
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W  =  1  -  Z^i.«  Pr(^0»  ^  \.n) 

neO,  n*i 

=  1  - gij  Pr(^0 J)  ^  Hj)  -  HSun  Pr(^0’. n)  >  a,  „) 

n£CX  n^ij 


>^-gij-  Zft." 

neCX  n*^ij 

=  1-  H8i.n 

nen,  n^j 

=  0 


which  is  the  criterion  for  aperiodicity  (Cinlar  [1975,  pg.  126]). 


To  complete  the  proof,  reversibility  (2.7)  is  shown  to  be  satisfied.  Substituting  (3.1), 
(3.2),  and  (3.4)  into  (2.7)  leads  to 


Pr(fi>pr,0  a  A.,,,)  ^  Vr(R,(optJ)>.^,)  ^ 

neO  neQ 


(3.5) 


Consider  any  two  solutions  ij  e Q,  i^j,  and  examine  reversibility  under  the  following 


cases: 

Case  1:  If  Vx{R^{opt,j)  >  <  Vv{R^{ppt,i)  >  then  (3.5)  can  be  rewritten  as 

Pr(^*(op/,/)>A,^,)  Vx[R,{pptJ)>^^^,J)  Vx[R,{pptJ)>i.  ) 

- ^ - - — -kIc} — ~  '  —  ^ 

5]Pr(^^(o;7?,«)>A,^„)  Pr(^^(o/7/,/)>A,^,,)  ^ 

nen  neO 


where  reversibility  holds,  fi’om  condition  (b). 

Case  2;  If  ^x{R,^{opt,j)  >  ^opt,j)  =  Pr(^t(cp^,0  ^  j),  then  (3.5)  can  be  rewritten  as 
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VT(R,(opt,i)>^,^j)  Vr(R,(optJ)>^,^j) 

_ ■  ,  . * .  or^  ,  I . ^••{K} 

Y,^T[R,(opt,n)>A^^„)  '■•'  '^?T[R,iopt,n)>A^^„) 

neCi  neQ 

By  condition  (b),  reversibility  again  holds. 

Case  3:  If  Pr^.^^  ipptj)  >  y)  >  Vv{R^{ppt,i)  >  a„^  then  by  the  min  function  in 

(3.1),  this  reduces  to  Case  2.  ■ 

Theorem  3.2  proves  convergence  of  the  stationary  distributions  7c(A:)  to  the  set  of  glob¬ 
ally  optimal  solutions. 

Theorem  3.2:  Under  the  conditions  and  assumptions  of  Theorem  3.1,  if,  for  all 
i,j  eQ  and  for  all  iterations  k, 

Ci  <  Cj  =>  limPr(i^(/.y)  >  a,^.)  =  0  (3.6) 


then,  as  k  approaches  infinity,  the  unique  stationary  distributions  Ti{k)  of  Theorem  3.1 
approach  the  limiting  form 


lim7i,(^) 

k-*oo 


{card(n‘’^)y 

0 

V. 


if  i 

otherwise. 


(3.7) 


Proof:  The  proof,  based  on  a  result  by  Aarts  and  Korst  [1989,  pg.  18],  examines  whether 
a  limit  exists  for  the  sequence  of  stationary  distributions  as  k  approaches  infinity.  Taking 
the  limit  of  (3.4)  for  all  /  gQ  leads  to 
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(3.8) 


lim  71.  (A:)  =  lim 

t-»®  i->® 


Pr(^,(qpt,i)>A^^,) 
J^Pr(Rj,(op/J)  >  A,^j) 


For  any  opt  e  ,  (3.8)  reduces  to 


(3.9) 


For  all  nonoptimal  solutions  n  eQ\Q'’^,  from  (3.6),  (3.8)  reduces  to 

lim  7C„(^)  =  lim-=:^— 

t^co  ^  t->®  y(i) 

jeO'^ 


=  0 


(3.10) 


Together,  (3.9)  and  (3.10)  satisfy  (3.7),  which  completes  the  proof.  ■ 

3.3  niustrative  Examples 

Four  illustrative  examples  (Johnson  and  Jacobson  [1996])  are  presented  to  show 
how  the  class  of  GHC  algorithms  defined  in  Section  3.1  offers  flexibility  in  defining  the 
solution  acceptance  random  variable. 

3.3.1  Generalized  Hill  Climbing  Acceptance  Using  an  Exponential 
Random  Variable 

Set  R^(opt,n)  =  ln(C/)  for  all  opt  eQ‘’^,n  eQ  and  for  all  iterations  k,  where 
t^  is  the  temperature  parameter  (2.2)  and  U  is  distributed  U(0,\) .  Then 

Px{R^ippt,n)  >  =  exp(-A„^„  /  Z^),  and  therefore  (3.1)  can  be  expressed  as 
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=  ininjl,  exp(-A,.^/4)), 

which  is  the  SA  formulation  (Laarhoven  and  Aarts  [1987,  pg.  20]).  Note  that  for  all 
ij  e Q  and  for  all  >  0 ,  exp(-A,j.  1 1^)>0,  hence  condition  (c)  is  satisfied.  Therefore 

if  conditions  (3.3),  (a),  and  (b)  on  the  solution  generation  probabilities  are  met,  then  Theo¬ 
rem  3.1  applies.  Finally,  observe  that  lim|min|l,  exp(-A,j  / =  0  if  >  0 ,  and  1 

otherwise,  hence  condition  (3.6)  is  satisfied  and  so  Theorem  3.2  also  applies.  Note  that 
the  optimal  objective  function  value  does  not  need  to  be  known  to  calculate  (3.1). 

3.3.2  Generalized  Hill  Climbing  Acceptance  Using  a  (Continuous-Valued) 

Geometric  Random  Variable 

Set  RJopt,n)  s . . ^  for  all  opt  eCr^,n  efl  and  for  all  iterations  k,  where 

In(l-A) 

p^  is  a  probability  parameter  such  that  0<p^<\  and  lim =  1 ,  and  U  is  distributed 

k->ot> 

U(0,l) .  Then  Pr^T?;^  (ppt,n)  >  a^^  =  (1  -  >  ^nd  therefore  (3.1)  can  be  expressed 

as 

Pr(7?^(/,y)>Aj  =  min|l, 

=  min{l,  (1-/7^)^''^}. 

(This  geometric  formulation  is  continuous-valued  because  a,j  e  91 .)  Observe  that  for  all 
i,j  e  Q  and  for  all  /?*<!,  (1  -/?*)*'''  >  0 ,  hence  condition  (c)  is  satisfied.  Therefore  if 


exp(-A,^,,.  /4)] 
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conditions  (3.3),  (a),  and  (b)  on  the  solution  generation  probabilities  are  met,  then  Theo¬ 
rem  3.1  applies.  Also,  note  that  lir^min|l,  |j  =  0  if  a,^  >  0 ,  and  1  otherwise, 

hence  condition  (3.6)  is  satisfied  and  so  Theorem  3.2  applies.  Again,  the  value  of  is 
not  needed  to  calculate  (3.1). 

3.3.3  Generalized  Hill  Climbing  Acceptance  Using  a  Weibull  Random  Variable 

Set  i?t(Ojpt,«)  =  tt(-ln(f/))’^“  for  all  opt  ,n  eCl  and  for  all  iterations  k, 
where  a  >  0  is  a  shape  parameter,  4  is  the  temperature  parameter  described  in  (2.2),  and 
U  is  distributed  t/(0,l) .  Then  VT^R^(ppt,n)  ^  a„^  =  exp(-(A„^„  /  /*)“) ,  and  therefore 
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=  0  if  i,j  ,  and  1  otherwise,  hence  condition 


(3.6)  is  satisfied  and  Theorem  3.2  applies.  Finally,  note  that  the  Weibull  random  variable 
formulation  requires  to  be  known,  and  so  the  acceptance  probability  distribution  for 

^kQJ)  cannot  be  written  explicitly  in  terms  only  of  solutions  i  and  j.  However,  its  distri¬ 
bution  function  can  be  expressed  in  closed  form,  as  seen  by  (3.11),  and  as  shown  for  the 
general  case  in  Section  3.3.4. 

3.3.4  The  Generalized  Hill  Climbing  Acceptance  Distribution  for  a  General 
Random  Variable 

Any  acceptance  probability  distribution  for  R^{opt,ri)  that  can  be  described  explic¬ 
itly,  enables  the  acceptance  probability  distribution  for  Rt(i,j)  to  be  expressed  implicitly. 
To  see  this,  rewrite  (3.1)  as 

Pr(^*  1  ~  ^ 

Pr(^^(o/7/,/)  >  ^,^^)-Vx{R,{opt,j)  > 
Vx{R,{opt,i)>^^^^) 

Therefore,  (3.12)  provides  a  general  expression  for  the  distribution  function  for  R^ij,])  in 
terms  of  the  distribution  functions  for  R^{ppt,i)  and  R^(ppt,j) .  If  (3.12)  is  invertible, 
then  a  closed  form  expression  for  R^ij,])  can  be  obtained. 
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CHAPTER  4:  GENERALIZED  HILL  CLIMBING 

CONVERGENCE 
WITHOUT  REVERSIBILITY 

Chapter  4  provides  a  proof  of  GHC  algorithm  convergence  based  on 
homogeneous  Markov  chain  theory  that  does  not  require  the  sufficient  condition  of 
reversibility  (2.7).  The  proof  methodology  is  based  on  a  perturbation  theory  developed 
for  problems  in  linear  algebra.  Convergence  is  shown  by  bounding  the  sequence  of 
stationary  distributions  associated  with  a  general  GHC  algorithm,  with  a  sequence  of 
stationary  distributions  associated  with  a  GHC  algorithm  with  known  convergence 
properties  (e.g.,  see  Section  3.3.1).  The  key  benefits  of  this  approach  are  that  the 
reversibility  condition  is  not  required  and  the  objective  function  value  of  a  globally  optimal 
solution  is  not  needed.  One  limitation  of  this  approach  is  that  as  k  approaches  infinity,  the 
transition  matrix  of  the  general  GHC  algorithm  must  converge  to  the  transition  matrix  of 
the  convergent  GHC  algorithm  at  a  rate  fast  enough  to  control  the  bound. 

4.1  Generalized  Hill  Climbing  Class  Description 

Consider  a  class  of  GHC  algorithms  where,  for  all  ij  eQ,  the  solution  acceptance 
probabilities  are  required  only  to  satisfy  the  conditions 

Pr(i?t  (i,j)  >  Af  j  j  >  0  for  all  iterations  k .  (4.1) 

and 

Ci  <  Cj  =>  0>y)  ^  )  =  0  (4.2) 
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Conditions  (4.1)  and  (4.2),  when  combined  vwth  the  generation  probability  conditions  of 
Theorem  4.1,  ensure  that  all  neighboring  solutions  communicate  for  all  iterations  k,  and 
that  communication  is  gradually  lost  as  k  approaches  infinity. 

4.2  Proof  of  Convergence 

Define  P^(k)  and  7t^(A:)  to  be  the  respective  transition  matrix  and  stationary 
distribution  for  an  arbitrary  GHC  algorithm,  for  each  iteration  k.  Let  P^{k)  and  n^{k) 
be  the  respective  transition  matrix  and  stationary  distribution  for  an  instance  of  the  GHC 
algorithm  that  is  known  to  converge  to  the  set  of  globally  optimal  solutions,  as  k 
approaches  infinity.  Theorem  4. 1  provides  sufficient  conditions  on  the  arbitrary  GHC 
algorithm  transition  matrix  P^(k)  to  ensure  the  existence  of  a  unique  stationary 
distribution  7t*(^)  for  each  finite  iteration  k.  Theorem  4.2  proves  the  existence  of  a 
generalized  matrix  inverse,  and  Theorem  4.3  formulates  a  bound  for  7t*  (it)  in  terms  of 
the  difference  between  P^ik)  and  P’^ik),  and  a  generalized  inverse  corresponding  to 
P^{k).  Theorem  4.4  proves  asymptotic  convergence  to  the  set  of  optimal  solutions 
under  a  condition  on  the  bound. 

Theorem  4.1:  Let  (Q,5V,c)  denote  an  instance  of  a  discrete  optimization 
problem.  Let  the  one-step  transition  probabilities  associated  with  the  GHC  algorithm  be 
defined  as 


47 


j  =  i- 


(4.3) 


>  A, 


0 


1-  Y.W 


M 


Let  the  acceptance  probabilities  satisfy  (4.1).  Let  the  generation  probabilities  gijik) 
satisfy: 

(a)  For  all  k  and for  all  i,j  e  Cl  there  exists  an  integer  d>  \  and  a  corresponding 
sequence  of  solutions  l^,l^J^,...Jj  eQ,  with  4  =7,/^  =  j,  cmd 

m  =  QX...,d-\. 

(b)  For  all  ij  e  Q ,  g,j  (A:)  >  0  =>  gj,  (k)>0. 

(c)  Each  generation  probability  gij{k)  is  independent  of  k  (e.g.,  gij{k)  -  g^j 
for  all  i,J  e  Cl  and  all  iterations  k). 

Then  for  each  iteration  k,  there  exists  a  unique  stationary  distribution  Ti^{k) . 

Proof:  The  irreducibility  and  aperiodicity  conditions  of  Theorem  2.1  are  satisfied,  as 
shown  in  the  proof  of  Theorem  3.1.  ■ 

Conditions  (4.1),  (a),  (b),  and  (c)  ensure  that  a  path  exists  between  any  two 
solutions  in  Q,  for  all  iterations  k.  Condition  (c)  also  implies  that  all  transition 
probabililities  are  asymptotically  monotone,  which  is  needed  for  the  proof  of  Theorem  4.5. 
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Note  that  although  proving  existence  of  the  unique  stationary  distribution 
for  the  transition  probabilities  (4.3)  is  straightforward,  it  is  very  difficult  to  derive  an 
analytically  tractable  explicit  formulation  of  the  stationary  distribution,  for  the  reasons 
discussed  in  Chapter  2.  Therefore,  convergence  to  the  set  of  optimal  solutions  is  shown  in 
terms  of  a  bound  on  the  stationary  probabilities  %^{k) ,  as  k  approaches  infinity. 

To  achieve  this,  the  group  inverse,  A^(k),  is  introduced.  Let 
k  =  1,2,... ,  be  the  sequence  of  transition  matrices  of  irreducible,  aperiodic  Markov  chains 
whose  corresponding  stationary  distributions  |7C^(^)| ,  k  -  1,2,... ,  converge  as  A:  ->  oo 

to  the  form  where  all  probability  mass  is  concentrated  on  the  set  of  optimal  solutions  . 

Meyer  [1975]  defines  A^(k)  as  follows: 
first  let 

A{k)^I-P\k),  (4.4) 

where  /  is  an  Ifl]  x  |Q|  identity  matrix,  and  let 

\\k) 

n(*)^  :  (4.5) 

%^{k) 

be  the  square  stochastic  matrix  composed  of  identical  rows  of  the  stationary  probability 
vector  n^ijc)  corresponding  to  P^(k).  The  group  inverse  is  then  defined  as 

A\k)  =  (l-P^  (k)  +  n(k)y'  -  n(A:)  (4.6) 
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(Meyer  [1975]).  Theorem  4.2  guarantees  that exists,  for  all 

Theorem  4.2:  For  every  transition  matrix  P(Jc),  the  generalized  inverse  matrix 

A*{k)  exists,  where  A{K)  is  defined  by  (4.4).  Furthermore,  if  P(k)  is  irreducible  and 

00 

aperiodic,  then  A^{k)  =  (^)-n(A^)),  where  P^"\k)  refers  to  the  m-step 

fn=0 

transition  probability  matrix  as  defined  in  Theorem  2.1,  and  n(^)  is  defined  by  (4.5). 
Proof:  Modify  Meyer’s  [1975]  result  by  making  each  matrix  a  function  of  k.  ■ 


Define  the  vector  one-norm  ||jc[|,  corresponding  matrix  one-norm 


jea 


|.X|1,  =  m^^  .  Theorem  4.3  pro\ddes  a  bound  for  7C*(^)  in  terms  of 


}€Q 


{P^{k)  -  (Jt))  and  A’*{k) ,  for  each  iteration  k. 

Theorem  4,3:  For  each  iteration  k,  let  P^(k)  be  the  transition  matrix  of  an 
irreducible,  aperiodic  homogeneous  Markov  chain,  let  P^ik)  be  defined  as  in  Theorem 


4.1,  and  let  A*{k)  be  defined  by  {4.6).  Then 


It' (t)  -  n' (i|  <  ||(/>* (t)  -  P’  (i)).4' (i| 


(4.7) 


Proof:  Modify  Meyer’s  [1980]  Theorem  4.2  by  making  each  vector  and  matrix  a  function 
of  k.  ■ 

From  Theorem  4.3,  to  prove  that  lim||7t^(A:)  -  7C^(A^)|  =  0 ,  it  suffices  to  show  that 


(i>'(i)-i"'(*)).4'(i)||  approaches  zero,  for  k  sufficiently  large.  Theorem  4.4 
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represents  a  special  case  of  the  GHC  algorithm  where  {P^  (k)  -  (JcfjA*  {k)  ^ 


can  be 


bounded  by  an  arbitrarily  small  number. 

Theorem  4.4;  For  each  iteration  k,  let  the  GHC  algorithm  one-step  transition 
matrix  P^(k)  be  defined  in  Theorem  4.1,  and  let  the  GHC  algorithm  acceptance 

probability  satisfy  (4.2).  Let  P^{k)  be  the  transition  matrix  of  an 

irreducible  aperiodic  Markov  chain  whose  solution  acceptance  probabilities  satisfy  (4.1) 
and  (4.2),  and  whose  generation  probabilities  are  identical  to  the  solution  generation 
probabilities  of  P^  (k) .  Assume  that  all  probability  mass  of  the  stationary  distributions 
%^(k)  converge  to  the  set  of  optimal  solutions  cQ  as  k-^oo.  Let  A(k)be 
defined  by  (4.4),  and  assume  that  for  every  Z  e  91^ ,  there  exists  an  iteration  k^iZ)  such 
that 

|(P'(«:)-P'(t)).4'(t|  <Z,  (4.8) 

for  all  k  ^  k^iZ).  Then,  by  making  Z  arbitrarily  close  to  zero,  there  exists  some  k^{Z) 
such  that  for  all  ^>^o(Z),  the  stationary  distributions  n^{k)  of  Theorem  4.1 
converge  as  k  ^oo  to  the  form  where  all  probability  mass  is  concentrated  on  the  set  of 
optimal  solutions  . 

Proof:  For  any  Z  >  0  (arbitrarily  close  to  zero),  there  exists  some  k^i^Z)  such  that  (4.8) 
holds  for  all  A:  >  ^0  (Z) .  Use  this  and  (4.7)  to  obtmn 

<Z, 

foraU  itSjt,(Z). 
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Therefore, 

lim  %^{k)  -  7c'*(A^)||  <  lim(Z)  =  Z, 

*->00  111  t->oo'  ' 

and  so 

which  implies  that 

lim||7c®(A^)  -  JC^(A:)|  =  0 ,  as  Z  ->  0 .  ■ 

4.3  Implications  of  Theorem  4.4 

The  principal  contribution  of  Theorem  4.4  is  that  it  implies  that  matrix  perturbation 
techniques  can  only  be  used  to  prove  convergence  of  stochastic  hill  climbing  algorithms 
under  very  restrictive  conditions.  This  is  illustrated  by  Theorems  4.5  and  4.6. 

4.3.1  Norm  of  the  Inverse  Matrix 

The  key  assumption  of  Theorem  4.4  is  that  {P^{k)-P’^{k^A*{k'^  can  be 

bounded  arbitrarily  close  to  zero,  for  k  sufficiently  large.  In  general,  as  k  approaches 
infinity,  ||y4''(^)|  becomes  either  unbounded  or  indeterminate.  (See  Theorem  4.5.) 

To  show  this,  each  element  |.<4^(A^)j  is  first  expressed  in  terms  of  its  cofactor  and 
the  determinant  ofy4(A:),  by  using  (4.6)  and  Rabenstein  [1975,  pg  128],  to  obtain 

-n,{k) 

where 

Q(k)^I-P\k)  +  n{k),  (4.10) 


4.y'(*)|  = 


e,yW 

AQXQik) 
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and  Qji(k)  is  the  cofactor  of  the  element  Qjj{k) .  The  matrix  Q{k)  is  depicted  in 
Figure  4. 1 .  Lemma  4. 1  states  a  result  needed  for  Theorem  4.5. 

Lemma  4.1  (Protter  and  Morrey,  [1991,  pg  53]:  Let  |.(4^(^)|  be  expressed  by 
(4.9)  for  all  i,j  e  and  all  iterations  k,  and  let  Q(k)  be  defined  by  (4. 10).  If 

(a)  lim Q.. (k)  =  Z  or  Urn Q,, (^)  =  oo , and 

(b)  det  Q(k)  ^  0  for  all  iterations  k,  and 

(c)  limdetiSW  = 

t->00 

then 

lim 

t->a) 

Theorem  4.5  shows  that  the  generalized  inverse  matrix  norm  is  asymptotically  unbounded 
or  indeterminate. 


QjAk) 


dete(ifc) 


=  00. 


-P^ik)  +  %l{k)  .. 

P\,ca>d{a)  (^)  +  ^asn/(n)  (^) 

1 

~P2,card(a)  (^)  "*■  ’^eanHa)  (^) 

; 

j*2 

; 

1 

~PccireHQ\l  (^)  +  ^1  W 

-PL^O,2ik)  +  1tUk)  .. 

^  Pcard(a\j  i^)  +  '^card(,a)  i^) 

1 

- 

j*^card{Cl) 

■ 

Figure  4. 1 .  The  matrix  Q{k).  The  elements  in  each  row  and  column 
are  ordered  in  terms  of  increasing  objective  function  value. 
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Theorem  4.5:  Let  A{k)  s  I -P^(k),  where  P^ik)  is  defined  in  Theorem  4.4. 
Then  for  any  instance  of  a  discrete  optimization  problem  (Cl,  IN',  c)  that  contains  at  least 
one  locally,  but  not  globally  optimal  solution, 

is  either  unbounded  or  indeterminate. 

Proof:  The  proof  first  shows  that  conditions  (b)  and  (c)  of  Lemma  4.1  are  satisfied,  and 

concludes  by  examining  the  two  possible  cases  for  Lemma  4.1(a),  as  k  approaches  infinity. 

To  show  that  Lemma  4.1(b)  holds,  assume  that  det|2(^)  =  0  for  some  iteration 
k^.  Then 

does  not  exist,  and  hence  (4.6)  implies  that  A*(kf)  must  not  exist.  But  Theorem  4.2 
states  that  the  inverse  A*^(k)  exists  for  all  iterations  k,  and  so  detg(A^)  0,  for  all  k. 

To  show  that  Lemma  4.1(c)  holds,  assume  without  loss  of  generality  that  the 
objective  function  value  of  each  solution  is  unique.  (If  not,  then  each  solution’s  objective 
function  value  could  be  perturbed  by  some  epsilon  amount.)  Then,  order  the  elements  in 
each  row  and  column  of  the  transition  matrix  P(k)  in  terms  of  increasing  objective  function 
value.  Hence  Q(k)  approaches  a  lower  triangular  form  as  k  increases,  and  thus  for  all  Q(k) 
terms  below  the  main  diagonal. 
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lim[-i>5(Ar)  +  TcJC^r)]  = 

-gfj  otherwise. 

For  all  terms  above  the  main  diagonal, 

=  0, 

and  so  det|2(^)  approaches  the  product  of  its  diagonal  values  as  k  approaches  infinity. 
For  any  locally,  but  not  globally  optimal  solution  /  e  Q ,  there  is  no  solution  j  e  W(;) 
such  that  Cj  <  c, ,  and  so  the  limiting  form  of  the  corresponding  diagonal  element  |2n  (^) 


is 


lim  G ,  ik)  =  li^/,  -P^jik)  +  nf  (A:)] 


=  lim 


lim 

t->oo| 


f  > 

J 

jeCl 

JO' 


/GO. 

J>> 


=  0  +  lim  P,^.(k)  +  lim  nf  (k) 

jeQ. 

y>« 


=  0.  (4.11) 

(Note  that  all  summands  are  bounded  and  nonnegative,  hence  the  limit  of  the  sum  is  equal 

to  the  sum  of  the  limits.)  Thus  for  any  local  (but  not  global)  optimum. 
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and  so  Lemma  4. 1(c)  is  satisfied. 


Finally,  consider  the  two  possible  cases  for  each  cofactor  (^) .  yV  ^  ^  • 

Case  1,  limQ.  .(A:)^0:  Recall  that  \Qj^(k)\  is  defined  as  the  absolute  value  of  the 

determinant  of  the  (card{Q)-\)  x  (cari/(a)- 1)  matrix  formed  by  deleting  the/  row  and 
I*  column  of  the  matrix  Q(k).  Furthermore,  since  Q(k)  =  I-P^(k)  +  n(A:) ,  and  since 
each  element  of  (it)  (and  therefore  n(A:))  is  asymptotically  monotone  from  (4.2)  and 
Theorem  4.1(c),  then  lima, (it)  must  also  be  asymptotically  monotone,  for  all  j,i  gQ. 

it— >«©  ' 

Therefore,  Lemma  4.1(a)  is  satisfied  and  so 

liml4:/*(^)|  = 

Case  2,  MmQ.Ak)  =  0 :  Since  limdet0(A:)  =  0,  then  (4.9)  approaches  an  indeterminate 

*-+00  *-+00 

O/O  form  as  k  approaches  infinity. 

Therefore  fi-om  Case  1  and  Case  2,  lim|U*(A:)|l  is  either  unbounded  or  indeterminate.  ■ 

*->ooll  III 


Note  that  (4.8)  implies  that 


P^(it)-P^(it)|  must  control  the  convergence  of 


||(p^(it)  -  (it))/4''(it)||  .  This  in  turn  suggests  that,  with  respect  to  k,  the  convergence 

rate  of  P^ik)  to  P^ijc)  must  dominate  the  rate  that  Qik)  approaches  a  lower  triangular 
form.  Equivalently,  P^ik)  must  converge  to  P^{k)  faster  than  the  rate  that  (at  least  one) 
eigenvalue  of  Q(k)  converges  to  zero. 
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Note  also  that  as  k  increases  to  infinity,  the  diagonal  elements  of  QQc)  provide  a 
measure  of  the  topology  of  the  (Q,!N',c)  instance.  This  is  illustrated  in  Lemma  4.2. 
Lemma  4.2:  Suppose  that  for  an  instance  (Q,iAr,c) , 

lim  Q,  j(k)>0  for  every  /  ^  (4.12) 

t->00 

then  every  optimal  solution  must  be  globally  optimal. 

Proof  (by  contradiction):  Assume  that  at  least  one  optimal  solution  y  e  Q  is  a  local,  but 
not  global,  optimum.  Then  (4.11)  holds,  which  contradicts  (4.12),  hence  j  must  be  a 
global  optimum.  ■ 

Note  that  if  (4.12)  holds,  then  a  simple  local  search  algorithm  would  eventually 
find  a  global  optimum,  with  probability  one.  However,  the  solution  space  must  be 
enumerated  to  verify  that  (4.12)  does  hold. 

Finally,  (4.11)  implies  that  an  (Q,5V,c)  instance  containing  multiple  local  optima 
would  cause  detQ(^)  to  decrease  relatively  rapidly  to  zero  as  k  increases.  However,  the 
determinant  would  be  very  difficult  to  compute  for  problems  with  large  solution  space 
cardinalities. 

4.3.2  Markov  Chain  Condition 

The  condition  number  of  a  square  matrix  is  a  measure  of  the  distance  of  the  matrix 
(in  terms  of  a  given  norm)  fi'om  a  singular  matrix  (Golub  and  Van  Loan  1989]).  The 
degree  of  closeness  is  measured  by  the  reciprocal  of  the  condition  number.  Meyer  [1980] 
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defines  the  condition  of  the  transition  matrix  corresponding  to  an  irreducible,  aperiodic 
Markov  chain  (indexed  on  each  iteration  k)  as 

co«d(,k)  =  \A{k%\A\k%^, 

where  y4(A:)  is  defined  by  (4.4),  and  A*{k)  is  defined  by  (4.6). 

One  way  to  see  that  the  transition  matrix  Aik)  becomes  ill-conditioned,  as  k 
becomes  large,  is  to  note  that  for  each  row  of  A{k)  corresponding  to  a  local  optimum, 
every  off-diagonal  element  approaches  zero.  This  is  true  because  every  move  from  a  local 
optimum  is  made  with  a  hill-climbing  transition  probability,  which  must  approach  zero  in 
the  limit  (from  (4.2)).  Furthermore,  each  diagonal  term  .4^  ^  (^)(for  a  local  optimum 

_/  €  Q,  such  that  c^  >  Cj  for  all  i  e(nr>5V(y)))  must  approach  zero  in  the  limit,  since 

f  \ 

\  i*J  J 

which  has  limit  zero  as  k  approaches  infinity.  Therefore,  A{k)  approaches  a  singular 
matrix  as  k  approaches  infinity. 

Another  perspective  of  the  conditioning  problem  is  based  on  a  result  by  Ipsen  and 
Meyer  [1994].  They  define  an  irreducible,  aperiodic  Markov  chain  (e.g.,  corresponding  to 
a  transition  matrix  P^(k)),  as  absolutely  stable  if  there  is  a  small  constant  k  such  that 

7t  J  (A:)  -  7C^  (A:)|  <  k[|p^  ik)  -  (^)| ,  for  all  y  e  n  and  all  iterations  k.  (4.13) 
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(Ipsen  and  Meyer  note  that  the  degree  of  smallness  is  problem-specific.)  They  then  show 
how  their  definition  of  stability  can  be  used  to  assess  the  condition  of  a  Markov  chain 
transition  matrix.  Theorem  4.6  provides  this  relationship. 

Theorem  4.6:  For  an  n-state  irreducible  Markov  chain,  the  chain  is  absolutely 
stable  if  and  only  if  all  entries  of  the  group  inverse  A^{k)  are  small,  where  smallness  is 
as  defined  in  Ipsen  and  Meyer  [ 1994 J. 

Proof:  Define  the  group  inverse  of  Ipsen  and  Meyer’s  [1994,  Theorem  5.2]  as  a 
function  of  k.  ■ 

Theorem  4.5  shows  that  for  optimization  problems  with  multiple  local  (but  not 
global)  optima,  the  inverse  matrix  norm  ||/4*(A:)||^  becomes  either  unbounded  or 

indeterminate  in  the  limit.  Theorems  4.5  and  4.6  together  suggest  that  the  Markov  chain 
corresponding  to  the  matrix  A{k)  becomes  absolutely  unstable  as  k  approaches  infinity, 
and  so  in  general,  a  small  constant  k  does  not  exist  that  satisfies  (4.13).  Hence  the 
bounds  (4.7)  and  (4.13)  are  useful  only  for  proving  the  convergence  of  a  limited  class  of 
GHC  algorithms. 

4.4  Dlustrative  Example 

The  following  example,  adapted  fi'om  Anily  and  Federgruen  [1987],  demonstrates 
how  the  rate  that  the  two  transition  matrices  converge  must  dominate  the  matrix  condition 
number  growth. 
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Let  P^{k)  be  defined  by  (3.2)  and  example  (3.3.1),  where  is  defined  in  (2.2) 
and  gi  j{k)  satisfies  conditions  (a)  and  (b)  of  Theorem  3.1.  Note  that  Theorem  3.1  and 
example  (3.3.1)  give  the  result 

Ttf  {k)  =  for  all  /■  G  Q ,  and  all  k. 

tied 

Define  P^ik)  by  (4.3),  and  let  the  solution  generation  probabilities  corresponding  to 
P^ik)  be  identical  to  those  of  P^(k).  Furthermore,  let 

^;w=^..,wpr(/?*(/.j)>A,,) 

(  2  r  11 

=  gij  (k)  min|l,  exp(-A,.  /t,)  +  {t,)  exp[-A,  y  J|, 

for  all  ij  G  ,  such  that  /  9^  y  and  any  y  >  1 .  Note  that  the  term 

(^)'exp[-A,^ /(/,)’'] 

dominates  (4. 14)  for  k  small,  while  the  SA  term  dominates  for  k  large. 

Consider  the  four-solution  problem  depicted  in  Figure  4.2.  Note  that  solution p  is 
optimal,  and  that  limTif  (A:)  =  1.  The  lines  connecting  the  solutions  indicate  the 

k-*<o  ' 

neighborhood  structure.  The  matrix  P^  {k)-P^{k)  is  computed  (see  Figure  4.3). 

The  matrix  A*{k)  is  then  calculated.  A(/*^)  =  l  +  exp(-l/r*^)  +  2exp(-2//j,)  is  a 
normalizing  value  for  the  stationary  distribution  n{k) .  The  matrix 
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Figure  4.2.  Sample  Problem.  The  neighborhood 
structure  is  shown  by  the  lines. 


1 

*-< 

0 

exp(-2/(rt)^)  -03(tt)^  exp(-2/(rt)’') 

0 

(r,)'  exp(-l/(r,)’') 

-0%)^  exp(-]/(rt)’')  -Oi(rt)^  exp(-]/(ft)^ j 

0 

0 

0 

0 

, - 

O 

0 

0 

- 1 

O 

Figure  4.3 .  The  matrix  (k)  -  (k) . 

The  rows  and  columns  are  arranged  in  order  p,  q,  r,  s. 
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Qijk)  =  I -P^{k)  +  n(Ar)  is  depicted  in  Figure  4.4.  The  determinant  of  Qik)  is  (using 
Mathematica) 


detect) = 

N(h)  mi,) 


+ 


23exp(-3/r*)  exp(-2/rt)  03exp(-l/rjt) 
1^)  mt,)  N{i~)  = 


^  03exp(-l/  4) 

-  J^) 


(4.15) 


(Note  that  limdet^(A:)  =  0,  as  noted  by  (4.11).)  The  matrix  A*(k)  is  shown  element- 

k-*oo 


wise  as 
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■^u  (^)  =  1-5  exp  —  + - - 

N(t,) 


-  [-3]  .  f-21  f-1] 

2exp  —  2exp —  exp  — 

\tj  1  1 

N(h)  N(t,)  ^  N{t,)  detQik)  Nit,y 


(-31  1-2' 

exp  —  exp  — 


n.  (-2)  "UJ  ‘'^1/ 

■^ij(^)=  Oiexp  — - : 

N(t,)  Nit,)  Nit, 


exp  — 


Nit,)  \d^mk)  Nit,)  ’ 


■^i3i^)=  03  exp  — 

’  \ttJ 


r-3i 

Oiexp  — 

wj 

Oiexp[^] 

1 

Nit,) 

Nit,) 

dete(/t) 

Oiexpl- 

A!4W=  Oiexp  ^ - ^ 

\t,J  Nit,) 


Oiexp 


exp|-^l 


Nit,)  detQik)  Nit,)  ’ 


,  .  expj— J  expf— ] 

4Alt)=  Oiexp  ^  _ _ 

Nit,)  Nit,)  Nit,)  detQik)  Nit,)’ 


^2.2(^)=  O-Sexp —  + 


2  expf —1 

1  * 

1 

Nit,) 

Nit,) 

dete(A:) 

exp  — 
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r-4^ 

exp  — 

f-2^ 

Oiexp  — 
Ut; 

1  OJexp(^] 

-j  * 

1 

Nit,) 

N(t,) 

Nit,) 

dQtQik) 

Nit,)  ’ 

(-4] 
exp  — 

Oiexp(^^ 

Oiexpg] 

-|  * 

1 

“"(3 

m,) 

NUt) 

Nih) 

dQtQik) 

Nih)  ’ 

m,)  ’ 
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Al(k)  = 


Oiexp 


r  ^ 

(  \ 

-5 

-4 

> 

exp 

exp 

■3 

m,) 


m,) 


0.75  exp 


0.5expf— 1  0.25  expf  ^ 


m,) 


m,) 


Nih) 


1 


exp 


dete(*)  N{t^) 


Al(k)  = 


Oi 

m,) 


1 

detg(^) 


1 

N{t,y 


<2^  = 


exp 


Oiexpl— 1+-^^:^ - 

UJ  N{t,)  Nit,) 


03  exp 


m,) 


1 


detG()t) 


mt,)  ' 


<3W  = 


03  exp 


-3 


yk>> 


r  \ 

-5 

-4 

exp 

exp 

Nit^)  Nit^) 


r-3^ 

r-2^ 

0.75  exp  — 

0.5  exp 

0.25  exp 

— 

J 

I4  J 

_i_ 

1 

m,) 

m) 

m) 

dot  Qik) 

mt,)  ’ 
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Al,(k)  = 


03  exp 


exp 


^5' 


y'kJ 


+  ■ 


exp 


Nitk)  N{t^) 


1.75  exp 


03  exp 


Nit,) 


■  +  ■ 


0.75  exp 


^1 


Nit,) 


Nit,) 


1 


detg(^^) 


Nit,)  • 


Note  that 


|(i>'  (*)  -P'<(k))A'  (*)[  £  ||(J"  (*)  -  P*  (i)|  \a‘  (*)|[ 

(Golub  and  Van  Loan  [1989].  Furthermore,  for  the  example  depicted  in  Figure  4.2, 

|(P^(^)-P'*(A:))|[  =2(4)"exp(-l/(/,y)  foralU.  (4.16) 


Using  the  lower  bound  (4.15)  for  det0(^^),  there  exists  some  such  that  for  all  k> 
and  all  i,j  eO, 


Ml,), 


0.5exp(-l/t^) 

I  V(4)  J 


and  so  (accounting  for  the  four  solutions  in  n),  the  desired  upper  bound  is 

||^''W||,  ^4(4exp(l//j). 


(4.17) 


Therefore,  (4.16)  and  (4.17)  lead  to 

Kp'W  -  P'  (i)|  ||.4'(*|  £  7(1, y  exp(-l/((,)'  )16exp(l  /  <.) 
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(4.18) 


=  32(/J'exp(l//^-l/(rJ'^j. 


Finally,  taking  the  limit  of  (4.18)  as  k  approaches  infinity,  and  using  condition  (2.2)  leads 
to 


and  so 


lim7iJ(A:)  =  l, 

i-+oo  ^ 

which  is  the  desired  result. 

Note  that  if  y  <  1 ,  then 

and  so  condition  (4.8)  is  not  satisfied.  Therefore  if  y  <  1,  then  Theorem  4.4  could  not  be 
used  to  prove  convergence  for  this  example. 
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CHAPTER  5:  GENERAL  PROOF  OF  CONVERGENCE 

FOR  THE 

GENERALIZED  HILL  CLIMBING  ALGORITHM 


Chapter  5  provides  a  convergence  proof  for  GHC  algorithms,  based  on  sufficient 
conditions  for  asymptotic  transition  probabilities  between  local  and  global  minima.  The 
principal  contribution  of  this  proof  is  the  relaxation  of  the  Markov  chain  reversibility 
condition,  without  requiring  the  special  conditions  on  the  GHC  algorithm  that  are  needed 
for  the  convergence  proof  in  Chapter  4. 

5.1  Deflnitions  and  Notation 

Define  G  c  Q  to  be  the  set  of  globally  optimal  solutions  (i.e.,  /  €  G  if  c,  <  Cj  for 
all  y  eG).  Define  L(z.Q\G  to  be  the  set  of  locally,  but  not  globally,  optimal  solutions 
(i.e.,  /  eZ,  if  Cj  <  Cj  for  all  j  e5V(/)).  Finally,  define  Zf  =  Q\(ZuG)  to  be  the  set  of 
all  other  solutions  in  Q . 

GHC  algorithms  traverse  the  solution  space  Q  in  search  of  a  globally  optimal 
solution.  To  understand  this  process,  the  concept  of  a  path  between  solutions  must  be 
defined. 

Definition  5.1:  A  path  from  i  to  j,  for  all  i,j  e(Z  uG),  is  a  sequence  of  solutions 
/o,/,,...,4  eQ  with  l^=i,lj=j,  /,,4,...,/^_,  eZf,  and  gi^j^S^)>0  for 
m  =  0,1,. ..,f/  - 1,  and  for  all  iterations  k. 
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Note  that  a  local  or  global  optimum  cannot  be  an  intermediate  solution  on  any 
path.  (The  GHC  algorithm  can  move  from  i  to  j  via  an  intermediate  solution  I  g(LuG), 

but  the  trajectory  would  not  be  defined  as  a  path.)  Two  paths  can  be  either  equivalent,  or 
distinct.  These  concepts  are  formally  defined. 

Definition  5.2:  Two  paths  between  solutions  i,j  e[L<jG)  are  said  to  be 
equivalent,  if 

a)  all  the  solutions  visited  along  both  paths  are  identical; 

b)  the  order  in  which  each  solution  is  visited  along  both  paths  is  identical. 

If  a  path  between  solutions  i,j  g{LkjG)  is  not  equivalent  to  any  other  path  between  i 
and  j,  then  the  path  is  said  to  be  distinct. 

Using  these  definitions,  the  probability  of  transitioning  between  solutions 
i,j  e(ZuG)  can  be  defined.  In  particular,  yj  is  the  probability  of  transitioning 

along  the  (distinct)  path  between  i,j  e(ZuG),  at  iteration  k.  yj  is  equal  to 

the  product  of  all  one-step  transition  probabilities  between  adjacent  solutions  along  the 
path;  e.g.,  for  i,j  eQ,  the  path  /,/,,...,/j_,,y  eQ  occurs  with  probability 

(51) 

^  '  m=0 

Note  that  path  distinctness  is  sufficient  for  the  probability  of  the  union  of  all  distinct  paths 
between  i,j  g  Q  to  be  equal  to  the  sum  of  the  probabilities  of  the  paths  (Cinlar  [1975,  pg 
3]),  and  so 
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all  distinct 
paths  between 
i,je{L\jG) 

E 

n=I 


for  all  ij  e  (Z  u  G),  i  ^  j. 


1- 


J€(LuO) 

s^i 


f=Js 


(5.2) 


0 


otherwise. 


The  example  depicted  in  Figure  5.1  illustrates  how  the  path  probabilities  (5.2)  are  defined. 
Note  that  i,l  &  L,  j  eG,  and  p,q,r,s  &  H .  The  neighborhood  structure  is  indicated  by 

the  lines  between  nodes.  Hence  (/  — )•  y)  >  0  ,  since  /  and  j  are  separated  only  by  q. 
Similarly,  (/  ->  y)  >  0 ,  since  /  and  j  are  separated  only  by  r.  However,  the  only  way  to 
reach  solution  /  fi-om  solution  i  is  to  pass  through  the  global  optimum  j,  and  so 
/>,(<•->/)  =  0. 
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Recall  that  the  GHC  algorithm  is  composed  of  an  outer  loop,  indexed  on  k,  and  an 
inner  loop,  indexed  on  m.  Furthermore,  7c(^)  is  the  equilibrium  (long-run)  probability 
vector  for  all  solutions  /  e  Q ,  for  each  k,  as  m->oo. 

Deflnition  5.3;  Let  o  (^)  =  ^  k,  (k) ,  and  define  the  equilibrium  probability 

i^LuG) 

6,. (^)  =  71,  (k)  I (ii{k),  for  all  /  e  (Z  u  G)  and  all  k. 

Let  6(^)  be  the  vector  of  probabilities  5i(k),  for  all  solutions  i  e(L<jG).  Note  that 
5^(^)  is  obtained  by  scaling  the  corresponding  equilibrium  probability  TZi(k)  only  for  the 
solutions  i  e(L'uG).  Furthermore,  5, (k)  >  7C, (k)  for  all  k,  since  a (^)  <  1 . 

Finally,  note  that  (5.1)  and  definition  5.3  allow  the  convergence  proof  (Theorem 
5.2)  to  focus  only  on  the  set  of  local  and  global  optima,  which  are  really  the  only  solutions 
of  interest  for  a  GHC  algorithm. 

5.2  Proof  of  Convergence 

Theorem  5.1  provides  sufficient  conditions  for  the  GHC  algorithm  to  converge  to 
the  set  of  solutions  (ZuG)  as  k  approaches  infinity.  Theorem  5.2  provides  additional 
sufficient  conditions  for  the  GHC  algorithm  to  converge  to  the  set  of  globally  optimal 
solutions  G  c  Q,  as  ^  approaches  infinity.  Note  that  Theorems  5. 1  and  5.2  do  not  use  the 
Markov  chain  model  to  prove  convergence  of  the  GHC  algorithm.  However,  the  property 
of  conditional  independence  continues  to  hold:  for  all  solutions  y  eO,  the  probability  of 
transitioning  to  the  next  solution  is  dependent  only  on  the  current  solution. 


71 


Theorem  5.1:  Let  (Q,5V,c)  denote  an  instance  of  a  discrete  optimization 
problem.  Let  each  GHC  transition  probability  Pi  jik)  be  defined  by  (3.2).  Let  the 
generation  probabilities  gijik)  satisfy  (3.3)  and  the  conditions 

(a)  for  all  ij  e  Q  and  all  iterations  k,  there  exists  an  integer  d>\  and  a 

corresponding  sequence  of  solutions  eQ,  with  4  =/,/j  =  j,  and 

(b)  lim  g^  .  {k)  >  0  for  all  iJ  &C1,  j  &  5V(/) . 

k->«> 

Moreover,  let  the  acceptance  probabilities  satisfy 

(c)  Pr(/?jt(/,y)  ^  A,^  j  >  0  for  all  iJ  eO  and  all  iterations  k, 

(d)  c,  <  c .  =>  hm  Pr(i4 (/,y)  S  a,. ^.)  =  0 . 

Then 


=  0  for  all  /■  e/T. 

k->ao 

Proof:  (First,  recall  that  conditions  (a)  and  (c)  are  sufficient  for  a  unique  distribution 
n(k)  to  exist  for  each  k,  as  proven  in  Theorem  3.1.)  The  proof  is  by  contradiction. 
Assume  that  all  solutions  in  H  have  strictly  positive  probability  in  the  limit,  and  order  the 
h  =  cardQL)  solutions  in  H  such  that  for  all  i,j  &  H ,  i  <j  implies  c,  <  Cj .  (Without 

loss  of  generality,  let  the  objective  function  value  of  each  solution  in  H  be  unique).  Let 
solution  number  one  correspond  to  the  solution  in  H  with  the  smallest  objective  function 
value,  solution  two  correspond  to  the  next  smallest,  and  so  on,  with  solution  h 


72 


corresponding  to  the  solution  in  H  with  the  largest  objective  fiinction  value.  Then,  the 
corresponding  equilibrium  probabilities  can  be  expressed  in  terms  of  h  equations,  using  the 
law  of  total  probability  (Cinlar  [1975,  pgl5]). 

The  proof  first  shows  that  the  equilibrium  probability  lim7C;,(^)  is  zero  for  the  /»* 

solution,  and  then  establishes  the  equilibrium  probabilities  of  the  remaining 
solutions  to  be  zero,  using  backward  substitution. 

First,  express  the  equilibrium  probability  vector  n(k)  in  terms  of  the  law  of  total 
probability,  by  conditioning  on  every  solution  j  e Cl,  to  obtain 

>  for  all  /  sQ  and  all  k.  (5.3) 

yen 

Next,  consider  the  set  of  solutions  Hcz  Cl,  and  assume  that 

lim  71,  (^)  =  8,  >  0 ,  for  all  /  e  .  (5.4) 

For  any  solution  i  &  H,  the  equilibrium  probability  expressed  in  terms  of  (5.3)  as 

yeG  yet  yeff 

Collect  71,  (^)  terms  on  the  left-hand  side  to  obtain 

7t,(i)(i-  /’,(*)) = ■  (5.5) 

yeG  yet  yeff 

j*i 

Note  that 

{!-/;,,(*))= Z^./(*)=  Z^./(*)+  Z^./(*)- 

yen.  yen,  yen, 

i*j  i*j,  Oi«>j 

C,iCj 

(e.g.,  (l  -  is  the  probability  that  the  process  does  not  remain  at  solution  i  gH 'm 

the  next  transition).  Hence,  (5.5)  can  be  expressed  as 
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/€n,  Jen.  JeG  JeL  JeH 


Jen,  Jen, 

i*J,  ‘>i<‘>j 

<Ci^Cj 


Rewrite  the  right-hand  side  in  terms  of  hill  climbing  transitions  for  elements  in  set  H  to 
obtain 


Jen,  Jen,  JeG  JeL 


Jen,  Jen, 

i*J, 


Note  that  since  all  sums  are  over  a  finite  number  of  bounded,  nonnegative  elements,  then 
the  limit  of  each  term  exists,  and  the  limit  of  the  sums  is  equal  to  the  sum  of  the  limits 
(Protter  and  Morrey  [1991,  pg  37]).  Therefore, 


f  "if 

=  lim  W  +iim  W 


+  lim  Yj^j{k)Pji{k)  +lim  Yj'^j{k)P^j{k) 


From  (3.2)  and  condition  (d),  all  one-step  hill  climbing  transition  probabilities  approach 
zero  in  the  limit.  Therefore, 
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f 


lim 


/eCl. 

C,<Cj 


=  lim 

k-*m 


Kjea  > 


since  each  summation  includes  only  deteriorating  (hill  climbing)  transition  probabilities. 
Therefore  in  the  limit,  (5.6)  simplifies  to 


limi 

*-»«> 


f  N 

/ea. 

i*i. 

\  C,^Cj  J 

II 

jgH. 

Kcj^c,  y 

,  for  all  /  &H . 


(5.7) 


Consider  (5.7)  for  the  particular  solution  i  =  h&H  (recall  that  h  is  the  solution  of 
maximal  objective  function  value,  e.g.,  c*  >c^.  for  all  j  QH\{h],  since  all  solutions  are 

arranged  in  order  of  increasing  objective  function  value).  The  right-hand  side  of  (5.7)  is 
zero  because  there  are  no  solutions  y  e  of  cost  greater  than  that  of  solution  h. 


Recall  that  lim  >  0 ,  fi-om  assumption  (5.4).  Furthermore,  since  h  is  by 


definition  not  a  local  minimum,  then  there  must  exist  at  least  one  solution  /  e  Q , 
/  e  5V(/i) ,  such  that  c^^c,.  Therefore,  from  (3.2)  and  condition  (b),  the  left-hand  side  of 
(5.7)  is 

r  ^ 


lim 

k-ya 


/gQ, 

h*j, 

\  Oh^j  / 


=  6,  lim 

C^^Cj 


t-KO 

>0, 
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which  is  a  contradiction,  since  the  right-hand  side  of  (5.7)  has  limit  zero.  Thus 

lim  =  0 . 


Now  express  (5.7)  in  terms  of  the  particular  solutions  h  and  (A  - 1)  e  to  obtain 


lim 

ieCi, 


=  lim7t*(^)P*(,_,)(^). 


The  right-hand  side  of  (5.9)  is  equal  to  zero,  from  (5.8).  Now  consider  the  left-hand  side 
of  (5.9).  Since  the  particular  element  (A -1)  eH  is  the  solution  of  maximal  objective 
function  value  in  the  set  //  \  {A} ,  hence  it  is  not  a  local  minimum,  then  there  must  exist  at 
least  one  solution  leQ,  /eiAr(A-l),  such  that  Therefore,  from  (3.2), 

condition  (b),  and  (5.4), 


lim  I, 

/eO,  Jea, 


which  is  a  contradiction,  because  the  right-hand  side  of  (5.9)  has  limit  zero.  Thus 

limVoW^O- 


Continuing  in  this  manner,  let 


lim7r,(A)  =  0,  for  all  J  =  2,3,...,h,  jeH, 


(5.10) 
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and  replace  (5.4)  with  the  assumption  that 


lim7Ui(^)  =  8.  >  0,  for  solution  \eH . 

k-*’B 


(5.11) 


Note  that  element  1  e/f  is  the  solution  of  minimum  objective  function  value  in  the  set  H. 
Rewrite  (5.7)  in  terms  of  the  particular  solution  1  e/f  to  obtain 


lim 


f  's 

f  \ 

jeQ. 

^*J. 

=  lim 

JeH. 

j*h 

Vcyic,  J 

(5.12) 


The  right-hand  side  of  (5.12)  is  equal  to  zero,  from  (5.10).  Now  consider  the  left-hand 
side  of  (5.12).  Since  the  particular  solution  \gH  can  not  also  be  in  the  set  (ZuG), 

hence  it  is  not  a  local  minimum,  and  so  there  must  exist  at  least  one  solution  /  eQ, 
/  e  iAr(l) ,  such  that  Cj  ^  c, .  Therefore,  from  (3.2),  condition  (b),  and  (5. 1 1), 


lim 


V 


=  e,lim 


yen. 

yew(l), 

C^iCj 


^eilimg,,(^) 

it->oo 


>0, 

which  is  a  contradiction,  because  the  right-hand  side  of  (5.12)  has  limit  zero.  Thus 


Therefore, 


lim7C,(^)  =  0. 

k-*<o 


lim7Cj(Z)  =  0,  for  all  i  &H . 

k-*’*) 


Corollary  5.1  shows  that  for  all  local  and  global  optima  i  g(L^G),  the  probabilities 
5,  (ife)  approach  the  equilibrium  probabilities  7t,  (^)  in  value,  as  A:  ->  qo  . 
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Corollary  5.1 :  For  all  i  g(LkjG), 

lim  5,  (k)  =  lim  ti,  {k) . 

k-*<o 


Proof:  Taking  the  limit  of  (2.6)  as  A:  ->  qo  leads  to 

r  ^ 


=  0+  2  lim;t,(A:) 

k-*<o 


fe(IuG) 


=  lim 


»e(Z,uG) 


=  lim(o(^). 

t->® 


Since  lim  oa  (^)  =  1  and  since  0  ^  Urn  (k)  <  1  for  all  /  e  Q ,  hence  the  limit  of  the 

*->®  k-*a>  ' 


quotient  (k)  /  ©  (k)  is  equal  to  the  quotient  of  the  limits  (Protter  and  Morrey  [1991, 
pg  39],  and  so 

lim 6,  (k)  =  lim(7t. (k)  / ©(A:))  =  lim  7i,  (k) ,  for  all  i  e(L<jG)  m 

t->®  *-»*'  '  k-ya>  '  ' 

Theorem  5.2  provides  sufficient  conditions  for  the  equilibrium  probability  of  all 
local  (but  not  global)  optima  to  approach  zero,  as  k  approaches  infinity. 

Theorem  5.2:  Under  the  conditions  and  assumptions  of  Theorem  5. 1,  if 

(e)  ^Pkij-^i)=^^  for  all  j  g  L,  /  e  (L  u  G), 

*•=1 

such  that  if  ->/■)>  0  for  all  k, 

(f)  iPkQ^J)<^°^  for  alii  eGJ &L, 

k=\ 
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(g)  'Z^j(k)Pk(J  ->  ^)  <  for  all  j,q &L,q^j,q 

k=\ 

then 

lim  5  Ak)  =  0  for  all  j  eL.  (513) 

k-*a>  •' 

Proof  (by  contradiction):  First,  each  element  of  5(^)  is  expressed  in  terms  of  the  law  of 
total  probability,  using  the  path  probabilities  (5.2).  Hence  for  each  iteration  k, 

=  +  forall  yeiuG.  (5.14) 

ieO  ieL 

Next,  assume  there  exists  some  j  &  L  and  an  iteration  such  that  for  all  ktk^, 
dj(k)  >  8  >  0 .  Summing  over  all  iterations  k  leads  to 


*=1  i=l  iea  t=l  iei 

Since  Q  is  finite  and  all  summands  are  nonnegative,  then  the  order  of  the  summations  can 
be  interchanged,  resulting  in 

*=1  i€G  k=\  isL  t=l 

Collecting  dj(k)  terms  on  the  left-hand  side  leads  to 


= 2E5,(t)n(/  ^y)+E|;6,(i)p.(i  (5.15) 

k=l  iea  t=l  ieL,  k=\ 

•*J 

Note  that  (l  -  ->  y))  is  the  probability  that,  given  the  process  is  in  solution  y  eZ, 

the  process  transitions  to  any  solution  /  e(LuG)  except  solution y.  (Note  that  since  all 
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solutions  communicate  (from  Theorem  5.1),  a  path  must  exist  such  that  j  can  reach  some 


q  &  {LkjG)  )  Therefore,  two  cases  are  possible. 


Case  1 :  Suppose  the  process  transitions  to  a  particular  global  optimum  q  eG.  Hence, 


and  so  (5.15)  becomes 

i=l  /eO  k=i  ieL,  k=\ 

i*J 

Since  6,  (k)  <  1  for  all  /  e  Q  and  all  k,  then  (5.16)  can  be  rewritten  as 


t=l  *=*0+1  ieG  k=\  i£L,  i=l 


For  the  left-hand  side  of  (5.17),  condition  (e)  leads  to 


Y,^Pk(j^q)  =  ^.  (5.18) 

*=*0+1 

Now  consider  the  right-hand  side  of  (5.17).  Conditions  (f)  and  (g)  and  the  fact  that  Q  is 


finite  leads  to 


Z  Z  ^  j) + Z  Z^/  J)  < 

iea  *=1  ieL,  k=\ 

i*J 

which  contradicts  (5.18).  Therefore  there  cannot  exist  any  iteration  such  that  (5.18) 


holds,  and  so  condition  (e)  implies  that 


lim5  (^)  =  0. 


Case  2:  Suppose  the  process  transitions  to  a  particular  local  optimum  q  eL.  Then  using 


the  same  argument  as  Case  1, 
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lim5.(A:)  =  0, 

t-»<o  ■* 

and  so 

lim5,(A^)  =  0,  forall  jeL.  ■ 

k-*ai  •' 

5.3  Implications  of  Theorems  5.1  and  5.2 

Theorems  5.1  and  5.2,  and  Corollary  5.1,  together  prove  that  under  certain 
conditions,  the  set  of  globally  optimal  solutions  G  must  occur  with  probability  one  as  k 
approaches  infinity,  since  the  equilibrium  probabilities  for  any  solution  in  H<jL 
approaches  zero.  However,  the  theorems  do  not  show  how  the  probability  mass  is 
asymptotically  distributed  among  the  global  optima.  Hence  in  the  limit,  some  globally 
optimal  solutions  may  occur  with  greater  probability  than  other  global  optima. 

Note  that  conditions  (e)  and  (Q  of  Theorem  5.2  are  consistent  with  Hajek’s 
condition  for  the  simulated  annealing  algorithm’s  cooling  parameter  P  in  (2.21).  If  the 
GHC  algorithm  is  formulated  as  SA,  then  conditions  (e)  and  (f)  are  satisfied  if  P  is  greater 
than  or  equal  to  the  depth  of  the  deepest  local  minimum  in  the  set  L,  but  less  than  the 
depth  of  a  global  minimum.  Note  also  that  conditions  (e)  and  (g)  together  imply  that  for 
all  solutions  j  &  L,  each  equilibrium  probability  bj{k)  must  approach  zero  at  a  minimum 

rate  sufBcient  for  (g)  to  hold,  as  ^  oo. 

Condition  (g)  requires  that  the  equilibrium  probability  distribution  %ik)  be 
explicitly  known  only  for  the  set  of  local  optima.  On  the  other  hand,  Anily  and 
Federgruen’s  [1987]  convergence  theorem  requires  that  the  equilibrium  distribution 
7c(^)be  known  for  all  solutions  in  Q— hence  Theorem  5.2  is  a  relaxation  of  Anily  and 
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Federgruen’s  [1987]  result,  in  that  Theorem  5.2  requires  equilibrium  distribution 
information  for  only  a  (presumably  small)  subset  of  the  solution  space.  However,  in 
practice  this  condition  would  still  be  very  difficult  to  check,  unless  reversibility  (2.7)  is 
also  satisfied. 

5.4  Blustrative  Examples 

Example  5.4.1  illustrates  how  a  GHC  acceptance  function,  based  on  a  polynomial 
function  of  k,  satisfies  the  conditions  of  Theorems  5.1  and  5.2.  Example  5.4.2  shows  that 
the  threshold  accepting  algorithm  does  not  satisfy  the  sufficient  conditions  in  Theorem  3.2, 
Theorem  4.4,  and  Theorem  5.1. 

5.4.1  Generalized  Hill  Climbing  Acceptance  as  a  Polynomial  Function  of  k 

Consider  the  eight-solution  example  depicted  in  Figure  5.2,  where  G  =  {/?}, 

L  =  H  Let  each  one-step  transition  probability  be 


P 

Ta 

O 

Ti  o/ 

/ 

\ 

\ 

/ 

^  rs 

r2 

<l2 

Figure  5.2.  Sample  problem.  The  neighborhood 
structure  is  shown  by  the  lines  connecting  the  nodes. 
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defined  (for  all  ^  >  2 )  as  in  Figure  5.3.  Note  that  g,  y  {k)  =  \l2  for  all  /  g  Q ,  y  e  , 
and  for  all  k.  Also,  R^^p,r^  =  for  p,rj  eQ,  y  =  l,2,3,4,  and 

Ri^^qi,rj)  =  Ag^  rj  for  all  /  =  1,2,3  and  y  =  1,2,3,4,  where  17  is  distributed  t/(0,l). 

Then  Pr(/^(/?,ry)  >  =  l/^" ,  and  Pr(i^(^„ry)  ^  a,^  =  1/^  for  aU  /  =  1,2,3  and 

7  =  1,2,3,4,  and  all  k.  Therefore,  all  solutions  in  O  communicate,  and  so  conditions  (a) 
and  (b)  of  Theorem  5.1  are  satisfied.  Furthermore,  all  hill  climbing  transition  probabilities 


Figure  5.3.  The  one-step  transition  matrix  P(k),  defined  for  k  >2. 
The  rows  and  columns  are  arranged  in  order  • 
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from  every  solution  in  (Z  u  G)  to  its  neighbors  in  H  are  strictly  positive,  with  limit  zero  as 
k  hence  conditions  (c)  and  (d)  of  Theorem  5.1  are  satisfied,  and  so  Theorem  5.1 
applies.  The  sufficient  conditions  of  Theorem  5.2  are  now  addressed. 

Condition  (e)  examines  the  paths  of  positive  probability  from  every  local  optimum 
to  all  solutions  in  (Z  u  G) .  Nine  positive  path  probabilities  exist: 

(i)  ^*(^1 

(ii)  ^t(^i->p)>0, 

(iii)  P,(q^  =  ->/?)>0, 

(iv)  P,(q^->qy)>0, 

(v) 

(vi)  ^2)  =  1  -  A(^2  ^  ^3)  >  0, 

(vii)  Pi(qi->q2)>0, 

(viii)  Pt(q3->P)>0, 

(ix)  = 

Note  that  Pic(q2  P)=  Pk(^\  ^3)  =  ^*(^3  ^  ^1)  =  since  either  path  must  visit  an 

intermediate  solution  in  LuG.  Note  also  that  from  the  problem’s  symmetry,  the 
probabilities  (i),  (ii),  and  (iii)  are  respectively  equal  to  (vii),  (viii),  and  (ix).  In  addition,  (i) 
is  equal  to  (ii),  (iv),  and  (v);  and  (iii)  is  equal  to  (vi).  Hence  only  (i)  and  (iii)  must  be 
checked.  For(i), 
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k=2 


■  ■  k=2 

=i;v(2*)(i/2) 

k=2 

=SV(«) 


k=2 
=  00. 


and  for  (iii). 


Zni9, 


k=2 


«.) = i[i  w)- 

=  f;[l-V(2tXl/2)-l/(2*Xl/2)] 

k=2 


k=2 
=  00. 


Therefore  condition  (e)  holds. 

Condition  (f)  is  now  addressed.  From  the  problem  symmetry, 


E  nC;-  ^  ?,) = ->  ?.) = Zi/(2*’)(V2) 

=2 

!:>/(«’) 


k=2 


k=2 

<00, 


and  so  (f)  holds. 

Condition  (g)  requires  that  the  equilibrium  probability  vector  b{k)  be  known  for 


all  solutions  in  L.  Hence  by  solving  (2.5),  Ti^{k)  =  — 


+3k  +  4 


and 


k  ]c^  "1“  3k 

'^C.  (^)  =  f— ;7~; »  ^  =  1.2,3.  Therefore,  (i)(k)  =  -^  -,  and  so 


+3A:  +  4 


A:'+3A:  +  4 
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^p(^)-  ~2 -  ^g,  (^)  “  75 - r  ^  =  1.2,3 .  Recall  that  the  only  positive  path 

k  "f*  3^  .  k  “h  3a 

probabilities  between  qf,qj  eL,  i^j,  are  (i),  (iv),  (v),  and  (vii);  furthermore,  they  are  all 
equal.  In  addition,  =  6,^(^)  =  5^^(it)  for  all  k,  and  so  it  suffices  to  check 

condition  (g)  for  only  one  path  between  solutions  in  L,  e.g.,  for  q^  and  ^2  •  Hence, 


00 


-^^2)= 


<co, 


and  so  the  sufficient  conditions  of  Theorems  5.1  and  5.2  are  satisfied.  Therefore 
lim  5  -  (^)  =  1 .  (Note  that 

*->00  ^ 


lim  6  (A:)  =  lim 


ik->ao 


*^00^2  *->co 


limTt  (^)  =  lim 


*-+00^2  ^3^ 


=  1. 


which  confirms  the  result.) 

5.4.2  GHC  Formulated  as  Threshold  Accepting 

The  threshold  accepting  (TA)  algorithm  (Dueck  and  Scheuer  [1990])  results  fi’om 
fixing  the  random  variable  as  a  constant  for  each  k.  To  implement  the  TA  algorithm, 
define  an  initial  threshold  such  that 


Q,  >  max(c.-c.) 

(5.19) 

for  all  k 

(5.20) 

lima=0. 

(5.21) 
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(The  initial  threshold  represents  the  minimum  one-step  increase  in  objective  function 
value  necessary  for  the  GHC  algorithm  to  be  able  to  transition  from  any  state  /  to  any 
other  state  j  ).  Then  the  GHC  acceptance  probability  distribution  is 

_'l  <5.22) 

0  otherwise. 

Note  that  no  proofs  of  TA  convergence  to  (1.1)  are  presented  in  the  literature  (Althofer 
and  Koschnick  [1991]).  Lemma  5.1  illustrates  why  the  TA  formulation  does  not  satisfy 
the  sufficient  conditions  needed  for  the  convergence  proofs  of  Theorems  3.2,  4.4,  or  5.1. 
Lemma  5.1;  Suppose  the  GHC  algorithm  acceptance  probability  Pr(i?i  ^  is 

defined  as  the  TA  formulation  (5.22).  Then  the  transition  probabilities 


Qk  ^ 

0 

1- 

seiHft) 


j^7f(i),j^i, 

j  ^!N'(i)J^i,  (5.23) 

j  =  h 


do  not  satisfy  the  irreducibility  condition  of  Theorems  3.2,  4.4,  and  5.1,  for  the  existence 
of  a  unique  stationary  distribution  %{k)  for  each  iteration  k. 


Proof:  Irreducibility  cannot  be  satisfied  unless  all  states  communicate  (Ross  [1993,  pg 
144]).  However,  from  (5.21)  the  TA  acceptance  measure  Q,^  has  limit  zero,  and  thus  for 
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any  locally  minimal  state  i  (e.g.,  c,  <  Cj  for  all  j  &!N'(i)),  there  exists  an  iteration  such 
that  Cj  -  c,  =  A,  ^  for  all  j  e  9^(i)  and  all  k>kf^.  Hence  state  i  no  longer 

communicates  with  any  state  J  eQ.  (e.g.,  Pr(Qt  ^  a,  =  0  for  all  j  &9^(i)  and  k^k^), 

and  thus  irreducibility  fails.  Without  irreducibility.  Theorem  2. 1  cannot  be  used  to  show 
the  existence  of  the  sequence  of  stationary  probability  distributions  necessary  for  the 
proofs  of  convergence  of  Theorems  3.2, 4.4,  or  5.1.  ■ 
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CHAPTER  6:  COMPUTATIONAL  RESULTS 


This  chapter  presents  the  methodology  and  computational  results  for  experiments 
conducted  on  three  discrete  optimization  problems.  The  purpose  of  the  experiments  was 
to  illustrate  the  relationships  between  the  GHC  algorithm  formulations  and  their  finite-time 
performance  on  the  different  problems.  The  three  problems  include: 

(a)  a  flexible  assembly  system  design  (FASD)  problem  (Kumar  and 
Jacobson  [1996]), 

(b)  a  generic  configuration  space  (Fleischer  [1994]), 

(c)  an  Air  Force  manufacturing  process  design  problem,  which  involves 
optimizing  machine  sequencing,  workpiece,  and  machine  parameter  settings 
(Walker  [1992]). 

The  five  GHC  algorithm  formulations  presented  include; 

(i)  simulated  annealing,  where  (i,j)  is  defined  as  in  Section  3.3.1, 

(ii)  threshold  accepting,  where  RtQJ)  is  defined  as  in  Section  5.4.2, 

(iii)  a  Weibull  formulation,  where  R^ityj)  is  defined  as  in  Section  3.3.3, 

(iv)  local  search,  where  J?t(/,y)  s  0,  for  all  i,j  eQ  and  all  k,  and 

(v)  Monte  Carlo  search,  where  {i,j)  =  *oo ,  for  all  i,j  6  Q  and  all  k. 

Tests  of  each  R^iiJ)  formulation  were  conducted  for  each  problem,  to  determine  which 
GHC  algorithm  (of  the  five  formulations  presented)  reached  the  best  solution  (i.e.,  the 
solution  with  the  lowest  objective  function  value),  for  a  fixed  number  of  iterations. 
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The  experiments  were  designed  using  guidelines  proposed  by  Crowder,  Dembo 
and  Mulvey  [1979],  All  computations  were  performed  in  C  on  either  a  SUN  Ultra- 1 
SPARCstation,  using  the  SunOS  5.5  operating  system,  or  a  SUN  10-51  SPARCstation 
using  the  SunOS  5.3  operating  system.  All  random  numbers  were  generated  using  a 
uniform  (0,1)  pseudo-random  linear  congruental  generator,  with  multiplier  16807, 
modulus  2147483647,  and  increment  0  (see  Press  et.  al  [1992,  pg  278]  or  Law  and  Kelton 
[1991,  pg  424]).  The  same  seed,  123,  was  used  to  initiate  all  experiments. 

6.1  Flexible  Assembly  System  Design  Problem 

The  FASD  problem  is  a  precedence-constrained  scheduling  problem,  in  which  a  set 
of  N  tasks  must  be  completed  by  a  set  of  Z  processor  types.  Each  processor  type  can 
complete  a  subset  of  the  tasks,  but  each  task  can  be  completed  by  only  one  processor  type. 
The  goal  is  to  determine  a  sequence  of  processors  that  minimizes  the  total  required 
number  of  processors,  while  satisfying  precedence  constraints  between  tasks.  In  essence, 
this  problem  is  a  precedence-constrained  Hamiltonian  path  problem,  where  each  task 
constitutes  a  vertex,  and  the  cost  of  each  path  is  the  sum  of  the  processors  (of  all  types) 
needed  to  complete  the  path.  Furthermore,  a  path  is  feasible  if  and  only  if  the  Hamiltonian 
path  satisfies  all  precedence  constraints  between  the  tasks.  Kumar  and  Jacobson  [1996] 
show  that  the  FASD  problem  is  NP-complete  (Garey  and  Johnson  [1979,  pg  17]). 
Jacobson  et.  al  [1996]  propose  a  simple  matrix-based,  polynomial-time  lower  bound 
algorithm  for  the  problem. 
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The  FASD  solution  space  is  composed  of  A-tuples  of  tasks,  where  a  solution  is 
an  ordering  of  the  N  tasks.  Therefore,  the  size  of  the  solution  space,  card(Q) ,  is  N\.  A 
neighbor  of  a  solution  is  defined  by  interchanging  the  positions  of  two  tasks  in  the  N-tuple 

solution.  Therefore,  each  solution,  i  eCl,  has  neighbors.  Lastly,  the  objective 

function  is  defined  by  first  penalizing  all  precedence  relation  violations,  then  computing 
the  sum  of  all  penalties,  and  finally  adding  the  total  number  of  required  processors. 

6.1.1  Experimental  Design 

Three  FASD  test  problems  were  constructed:  N  =  50  tasks  with  Z  =  20 
processor  types;  =  100  tasks  with  Z  =  20  processor  types;  and  A"  =  150  tasks  with 
Z  =  20  processor  types.  Each  task  has  at  most  five  predecessors,  and  exactly  one 
processor  type,  for  all  three  test  problems.  An  experiment  was  specified  by  selecting  a 
FASD  problem,  an  formulation,  and  a  corresponding  set  of  inputs  describing  the 

number  of  iterations  and  the  parameter’s  behavior,  and  then  executing  the  GHC 

algorithm  formulation  on  one  hundred  randomly  generated  problem  instances.  Each 
problem  instance  was  generated  using  its  own  unique  random  number  seed.  The  same 
seeds  were  used  to  generate  the  problem  instances  across  all  experiments  (i.e.,  all 
experiments  on  each  problem  used  the  same  problem  instances).  Furthermore,  the  same 
(feasible)  initial  solution  was  used  for  each  instance  of  every  experiment. 

For  each  FASD  problem  instance,  the  solution  with  the  minimum  objective 
function  value  found  by  the  GHC  algorithm  (to  date)  was  recorded,  and  the  solution’s 
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feasibility  was  noted.  For  each  experiment,  the  average  minimum  objective  function  value 
(i.e.,  the  average  minimum  number  of  required  processors),  the  standard  deviation,  and 
maximum  and  minimum  objective  function  values,  were  computed  from  the  set  of  feasible 
(best  to  date)  solutions  obtained  from  the  one  hundred  problem  instances. 

6.1.2  Results 

The  first  set  of  experiments  were  conducted  to  determine  how  the  five  selected 
GHC  parameter  choices,  the  problem  size,  and  the  number  of  iterations,  affected 

the  performance  of  the  GHC  algorithm  on  the  FASD  problem.  The  results  are  presented 
in  Tables  6.1  -  6.5.  Of  the  five  GHC  formulations,  SA’s  performance  was  the  least 
sensitive  to  changes  in  its  governing  parameters  (e.g.,  its  initial  temperature  its  cooling 

parameter  ((),  and  its  inner  and  outer  loop  values  M  and  K).  The  Weibull  formulation 
experiments  used  the  same  1^,  ([),  M,  and  K  parameters  as  used  for  the  SA  formulation, 

hence  it  measured  how  the  shape  parameter  a  would  affect  the  performance  of  SA.  Two 
Weibull  shape  parameters  were  considered:  a  =  1  /  2 ,  and  a  =  2 .  Note  that  the  a  =  1  /  2 
case  has  the  effect  of  increasing  the  probabilities  of  SA  hill  climbing  transitions; 
conversely,  the  a  =  2  case  reduces  the  corresponding  transition  probabilities.  For  the 
FASD  problem,  the  a  =  2  case  virtually  guaranteed  that  for  each  trial,  the  final  solution 
would  be  feasible.  However,  the  Weibull  formulation  did  not  measurably  improve  the 
mean  final  objective  function  values  over  those  found  by  the  SA  formulation. 
Furthermore,  the  GHC  algorithm  a  =  1  /  2  Weibull  formulation  was  unable  to  find  as 
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Table  6.1.  FASD  results  for  the  SA  GHC  algorithm,  where  =  1.0 ,  and  =  <l>^t  • 


Note:  cpu  times  are  for  the  SUN  Ultra-1,  and  are  typical  of  the  five  GHC  formulations. 


Outer 

loop 

K 

Inner 

loop 

M 

Decrement 

(t> 

Mean  Standard 

objective  deviation 
function 
value 

Fraction 

feasible 

Minimiun 

objective 

function 

value 

Maximum 

objective 

function 

value 

(50  tasks,  20  processors) 

500  200  .99 

(14  cpu  minutes) 
29.31  1.98 

1.00 

24 

34 

250 

400 

.95 

29.31 

1.85 

1.00 

24 

34 

100 

1000 

.90 

29.21 

1.93 

0.99 

25 

34 

100 

1000 

.85 

29.21 

2.00 

1.00 

24 

33 

500 

20 

.99 

(1.5  cpu  minutes) 
31.37  2.18 

1.00 

26 

36 

250 

40 

.95 

31.28 

2.13 

1.00 

27 

38 

100 

100 

.90 

31.37 

2.34 

1.00 

24 

37 

100 

100 

.85 

31.36 

1.93 

1.00 

26 

36 

(100  tasks,  20  processors) 

500  200  .99 

(25  cpu  minutes) 
47.52  3.14 

0.99 

37 

56 

250 

400 

.95 

47.72 

2.75 

0.99 

39 

54 

100 

1000 

.90 

47.61 

2.99 

0.96 

39 

54 

100 

1000 

.85 

47.84 

3.04 

0.97 

40 

56 

500 

20 

.99 

(2.5  cpu  minutes) 
58.38  3.08 

0.97 

51 

66 

250 

40 

.95 

57.25 

3.16 

0.99 

51 

65 

100 

100 

.90 

57.13 

3.02 

0.98 

49 

64 

100 

100 

.85 

57.37 

3.41 

0.98 

48 

64 

(150  tasks,  20  processors) 

500  200  .99 

(37  cpu  minutes) 
66.11  3.60 

0.96 

58 

74 

250 

400 

.95 

65.77 

4.13 

0.96 

59 

79 

100 

1000 

.90 

65.71 

3.80 

0.97 

58 

77 

100 

1000 

.85 

65.88 

3.93 

0.94 

57 

75 

500 

20 

.99 

(4  cpu  minutes) 
88.90  3.71 

0.96 

79 

96 

250 

40 

.95 

87.85 

3.62 

0.98 

75 

95 

100 

100 

.90 

87.96 

3.48 

0.97 

81 

96 

100 

100 

.85 

87.52 

3.68 

0.98 

78 

97 

93 


Table  6.2.  FASD  results  for  the  TA  GHC  algorithm,  where  -^Qk- 


Outer 

loop 

K 

Inner 

loop 

M 

Deere-  Initial 

ment  threshold 

<1)  2o 

Mean 

objective 

function 

value 

Standard 

deviation 

Fraction 

feasible 

Minimum 

objective 

function 

value 

Maximum 

objective 

function 

value 

(50  tasks,  20 

processors) 

500 

200 

0.995 

10 

31.36 

2.14 

1.00 

27 

36 

50 

2000 

0.90 

10 

29.76 

1.81 

1.00 

25 

34 

500 

200 

0.995 

100 

47.29 

2.15 

0.21 

42 

50 

50 

2000 

0.90 

100 

30.98 

1.92 

1.00 

25 

36 

500 

20 

0.995 

10 

38.38 

2.33 

0.88 

34 

48 

50 

200 

0.90 

10 

31.90 

2.22 

0.81 

26 

35 

500 

20 

0.995 

100 

47.29 

2.15 

0.21 

42 

50 

50 

200 

0.90 

100 

36.93 

2.46 

0.69 

31 

46 

(100  tasks,  20  processors) 

500 

200 

0.995 

10 

58.19 

3.38 

0.98 

50 

67 

50 

2000 

0.90 

10 

48.17 

3.24 

0.92 

42 

56 

500 

200 

0.995 

100 

94.94 

1.73 

0.16 

92 

98 

50 

2000 

0.90 

100 

55.26 

3.36 

0.93 

43 

64 

500 

20 

0.995 

10 

82.00 

5.63 

0.30 

72 

95 

50 

200 

0.90 

10 

61.42 

3.54 

0.33 

54 

69 

500 

20 

0.995 

100 

94.94 

1.73 

0.16 

92 

98 

50 

200 

0.90 

100 

87.14 

9.06 

0.14 

72 

98 

(150  tasks,  20  processors) 

500 

200 

0.995 

10 

89.11 

3.71 

0.85 

81 

96 

50 

2000 

0.90 

10 

67.46 

3.51 

0.78 

60 

76 

500 

200 

0.995 

100 

142.37 

2.45 

0.19 

136 

146 

50 

2000 

0.90 

100 

83.30 

3.33 

0.67 

73 

93 

500 

20 

0.995 

10 

140.09 

4.96 

0.22 

123 

147 

50 

200 

0.90 

10 

96.00 

1.41 

0.02 

95 

97 

500 

20 

0.995 

100 

142.37 

2.45 

0.19 

136 

146 

50 

200 

0.90 

100 

142.54 

2.18 

0.13 

138 

146 

94 


Table  6.3.  FASD  results  for  the  WeibuU  GHC  algorithm. 


where  4  =  1.0,  and  =^tf 


Outer 

loop 

K 

Inner 

loop 

M 

Decre¬ 

ment 

(1) 

Shape 

parameter 

a 

Mean 

objective 

function 

value 

Standard 

deviation 

Fraction 

feasible 

Minimum 

objective 

function 

value 

Maxi¬ 

mum 

objective 

function 

value 

(50  tasks,  20  processors) 

250 

400 

0.95 

0.5 

29.26 

2.16 

0.92 

24 

38 

250 

400 

0.95 

2.0 

29.34 

1.92 

1.00 

25 

34 

100 

1000 

0.85 

0.5 

29.42 

1.95 

0.93 

25 

35 

100 

1000 

0.85 

2.0 

29.28 

1.80 

1.00 

25 

33 

250 

40 

0.95 

0.5 

31.00 

2.23 

0.73 

26 

35 

250 

40 

0.95 

2.0 

31.40 

2.02 

1.00 

25 

36 

100 

100 

0.85 

0.5 

31.25 

2.16 

0.79 

25 

36 

100 

100 

0.85 

2.0 

31.33 

2.01 

1.00 

27 

36 

(100  tasks,  20  processors) 

250 

400 

0.95 

0.5 

46.69 

2.84 

0.62 

40 

53 

250 

400 

0.95 

2.0 

47.94 

2.68 

1.00 

41 

55 

100 

1000 

0.85 

0.5 

47.48 

2.92 

0.64 

40 

53 

100 

1000 

0.85 

2.0 

47.97 

2.90 

1.00 

41 

55 

250 

40 

0.95 

0.5 

57.87 

2.76 

0.53 

52 

64 

250 

40 

0.95 

2.0 

57.60 

3.37 

1.00 

48 

65 

100 

100 

0.85 

0.5 

57.66 

3.45 

0.53 

50 

66 

100 

100 

0.85 

2.0 

57.37 

3.32 

1.00 

48 

66 

(150  tasks,  20  processors) 

250 

400 

0.95 

0.5 

65.09 

3.61 

0.43 

59 

73 

250 

400 

0.95 

2.0 

65.88 

3.69 

1.00 

57 

76 

100 

1000 

0.85 

0.5 

65.18 

3.13 

0.38 

59 

73 

100 

1000 

0.85 

2.0 

65.98 

3.42 

1.00 

58 

75 

250 

40 

0.95 

0.5 

89.29 

3.77 

0.45 

83 

100 

250 

40 

0.95 

2.0 

86.88 

4.11 

1.00 

78 

97 

100 

100 

0.85 

0.5 

88.18 

4.32 

0.38 

81 

97 

100 

100 

0.85 

2.0 

86.92 

3.67 

1.00 

78 

96 

95 


Table  6.4.  FASD  results  for  the  local  search  GHC  algorithm. 


Outer 

loop 

K 

Inner 

loop 

M 

Para¬ 

meter 

RiiiJ) 

Mean 

objective 

function 

value 

Standard 

deviation 

Fraction 

feasible 

Minimum 

objective 

fimction 

value 

Maximum 

objective 

function 

value 

(50  tasks,  20  processors) 

1 

100000 

0 

36.46 

2.44 

1.00 

30 

43 

1 

10000 

0 

36.46 

2.44 

1.00 

30 

43 

(100  tasks,  20  processors) 

1 

100000 

0 

62.42 

3.43 

1.00 

51 

71 

1 

10000 

0 

63.99 

3.30 

1.00 

54 

72 

(150  tasks,  20  processors) 

1 

100000 

0 

87.82 

5.86 

1.00 

76 

107 

1 

10000 

0 

94.04 

4.46 

1.00 

84 

107 

Table  6.5.  FASD  results  for  the  Monte  Carlo  GHC  algorithm. 


Outer 

loop 

K 

Inner 

loop 

M 

Para¬ 

meter 

UiJ) 

Mean 

objective 

function 

value 

Standard 

deviation 

Fraction 

feasible 

Minimum 

objective 

fimction 

value 

Maximum 

objective 

fimction 

value 

(50  tasks,  20  processors) 

1 

100000 

le06 

47.29 

2.15 

0.21 

42 

50 

1 

10000 

le06 

47.29 

2.15 

0.21 

42 

50 

(100  tasks,  20  processors) 

1 

100000 

le06 

94.94 

1.73 

0.16 

92 

98 

1 

10000 

le06 

94.94 

1.73 

0.16 

92 

98 

(150  tasks,  20  processors) 

1 

100000 

le06 

142.37 

2.45 

0.19 

136 

146 

1 

10000 

le06 

142.37 

2.45 

0.19 

136 

145 

96 


many  feasible  final  solutions  as  the  corresponding  SA  formulation,  and  the  feasibility 
percentage  decreased  for  the  larger  FASD  problem  instances. 

Of  the  three  hill  climbing  GHC  formulations,  TA’s  performance  was  the  most 
sensitive  to  changes  in  its  parameters  (e.g.,  its  initial  threshold  ,  its  reduction  parameter 

(J),  and  its  inner  and  outer  loop  values  M  and  K).  TA  achieved  its  best  results  for  FASD  by 
using  a  low  initial  threshold  and  a  fast  threshold  decrement  rate,  (i.e.,  by  quickly 
converging  to  local  search).  However,  TA  was  generally  outperformed  by  SA  and  the 
Weibull  formulation  (in  terms  of  mean  final  objective  fianction  values  and  feasibility 
percentages),  especially  for  the  larger  FASD  problem  sizes.  Finally,  both  local  search  and 
Monte  Carlo  search  were  outperformed  by  the  three  hill  climbing  algorithms.  Monte 
Carlo  search  was  the  worst  performer,  in  terms  of  mean  objective  function  values  and  final 
solution  feasibility  percentages. 

The  second  set  of  experiments  were  conducted  to  determine  how  the  five  selected 
(^’J)  formulations  performed  over  ten  thousand  iterations  {K  =100,  M  =  100) ,  on  the 
FASD  N  =  501 Z  =  20,  100/20,  and  150/20  test  problems.  The  mean  final  objective 
function  value  found  by  each  R,,(i,f)  formulation  was  recorded  and  plotted  at  500- 
iteration  increments. 

The  results  for  the  FASD  50/20  problem  are  depicted  in  Figure  6.1.  For  SA, 
/(,  =  1.0 ,  and  the  decrement  was  0.85.  The  Weibull  used  the  same  value  and  decrement 
as  SA,  and  set  a  =  0.5 .  TA  used  =  10 .0,  and  the  decrement  was  0.9.  The  results  for 
the  FASD  100/20  and  150/20  problems  are  depicted  in  Figures  6.2  and  6.3,  respectively. 
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For  SA,  /(,  =  1.0 ,  and  the  decrement  was  0.85.  The  Weibull  used  the  same  value  and 
decrement  as  SA,  and  set  a  =  2.0 .  TA  used  Qq=3  .0,  and  the  decrement  was  0.90. 


Note  that  the  Monte  Carlo  formulation  was  unable  to  locate  ar^  solutions  with 
lower  objective  function  values  than  the  initial  solution.  The  local  search  implementation 
did  somewhat  better,  but  was  still  outperformed  by  the  hill  climbing  algorithms.  The  TA, 
SA,  and  Weibull  formulations  all  performed  about  the  same.  The  Monte  Carlo 
formulation’s  poor  performance  suggests  that  the  FASD  problem  probably  contains 
relatively  few  good  solutions,  while  the  local  search  formulation’s  relatively  better 
performance  suggests  that  neighboring  solutions  are  likely  to  have  similar  objective 
function  values  (e.g.,  neighborhoods  form  regions  of  attraction  around  local  optima). 


-  SA 

..  Q 

local  search 

—weibull 

X 

TA 

•  ■  X 

Monte  Carlo 

Figure  6. 1  Comparison  of  Acceptance  Formulations 
for  the  50/20  FASD  Problem. 
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Figure  6.2  Comparison  of  Acceptance  Formulations 
for  the  100/20  FASD  Problem. 
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6.2  Generic  Configuration  Space  Problem 

The  generic  configuration  space  problem  (Fleischer  [1994])  was  selected  because 
it  allows  the  researcher  to  precisely  control  each  test  problem’s  number,  depth,  and 
location  of  local  and  global  optima.  To  create  a  generic  configuration  space,  a  set  of 
objective  function  values  are  randomly  generated  and  stored  in  memory.  A  neighborhood 
is  then  defined  by  linking  each  objective  fimction  value  to  any  specified  number  of  other 
objective  function  values.  The  main  disadvantage  of  the  generic  configuration  space 
approach  is  that  its  computer  memory  requirement  is  excessive  for  large  solution  space 
cardinalities. 

6.2.1  Experimental  Design 

The  goal  of  the  generic  configuration  space  experiments  was  to  determine  whether 
neighborhood  size  and  the  range  in  objective  function  values  (i.e.,  the  problem’s  objective 
function  topology)  would  affect  the  performance  of  the  five  GHC  algorithm  formulations. 
Two  test  problems  were  created:  for  the  first  problem,  a  vector  of  five  thousand  objective 
function  values  was  defined,  with  values  between  zero  and  one  hundred.  For  the  second 
problem,  a  vector  of  five  thousand  objective  fimction  values  was  again  defined,  but  with 
values  between  zero  and  five  hundred.  An  experiment  was  defined  by  executing  one  of 
the  five  GHC  algorithm  formulations  on  one  hundred  randomly  generated  problem 
instances.  The  same  initial  solution  (vector  element)  was  used  for  each  instance  of  each 
experiment.  Finally,  for  each  experiment,  the  mean  best  to  date  objective  fimction  value. 
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the  standard  deviation,  and  maximum  and  minimum  objective  function  values,  were 
computed. 

The  experiments  assessed  whether  changing  the  total  number  of  iterations  (one 
hundred,  five  hundred,  or  one  thousand),  the  neighborhood  size  (from  ten  to  fifty 
neighbors,  in  steps  of  ten),  or  the  range  in  objective  function  values  (zero  to  one  hundred, 
or  zero  to  five  hundred),  would  affect  the  performance  of  the  five  GHC  algorithm 
formulations.  Finally,  pilot  runs  were  conducted  for  the  SA,  TA,  and  Weibull  GHC 
formulations.  The  pilot  runs  were  used  to  tune  the  input  parameters  of  the  respective 
GHC  formulations  to  obtain  their  best  possible  performance  for  the  generic  configuration 
space  problem. 

6.2.2  Results 

Tables  6.6  -  6.10  depict  the  results  for  the  five  selected  GHC  algorithm 
formulations.  For  the  generic  configuration  space,  the  five  formulations  performed  much 
differently  with  respect  to  each  other  than  they  had  for  the  FASD  problem  in  Section  6.1. 
The  Monte  Carlo  GHC  formulation  (see  Table  6.10)  consistently  found  mean  final 
objective  function  values  that  were  less  than  or  equal  to  those  found  by  the  four  other 
GHC  algorithm  formulations.  The  local  search  formulation  was  the  worst  performer,  and 
the  TA,  SA,  and  Weibull  formulations  aU  performed  about  the  same. 

For  all  five  GHC  formulations,  larger  neighborhoods  and  tighter  objective  function 
value  ranges  had  the  effect  of  improving  performance.  Increasing  the  total  number  of 
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iterations  improved  the  performance  of  all  GHC  formulations  except  local  search,  which 
apparently  became  trapped  in  local  minima  within  the  first  100  iterations. 


Table  6.6.  Results  for  the  SA  GHC  algorithm,  where  =  300,  and  =  0.95 . 


Outer 

loop 

K 

Inner 

loop 

M 

Objective 

function 

values 

Neighbor¬ 
hood  size 

Mean 

objective 

function 

value 

Standard 

deviation 

Minimiun 

objective 

function 

value 

Maximum 

objective 

function 

value 

100 

10 

0-100 

10 

0.69 

0.97 

0 

5 

100 

10 

0-100 

20 

0.36 

0.64 

0 

3 

100 

10 

0-100 

30 

0.14 

0.40 

0 

2 

100 

10 

0-100 

40 

0.09 

0.35 

0 

2 

100 

10 

0-100 

50 

0.04 

0.20 

0 

1 

100 

10 

0-500 

10 

9.12 

9.03 

0 

50 

100 

10 

0-500 

20 

5.31 

5.38 

0 

23 

100 

10 

0-500 

30 

3.97 

5.20 

0 

40 

100 

10 

0-500 

40 

2.87 

3.41 

0 

17 

100 

10 

0-500 

50 

2.18 

2.65 

0 

15 

100 

5 

0-100 

10 

1.28 

1.58 

0 

9 

100 

5 

0-100 

20 

0.63 

0.98 

0 

5 

100 

5 

0-100 

30 

0.34 

0.74 

0 

5 

100 

5 

0-100 

40 

0.19 

0.53 

0 

3 

100 

5 

0-100 

50 

0.18 

0.44 

0 

2 

100 

5 

0-500 

10 

11.52 

11.61 

0 

62 

100 

5 

0-500 

20 

7.97 

9.51 

0 

62 

100 

5 

0-500 

30 

5.11 

5.59 

0 

30 

100 

5 

0-500 

40 

3.69 

4.94 

0 

34 

100 

5 

0-500 

50 

2.72 

3.10 

0 

19 

100 

1 

0-100 

10 

3.72 

4.11 

0 

17 

100 

1 

0-100 

20 

1.86 

2.55 

0 

13 

100 

1 

0-100 

30 

1.61 

2.13 

0 

9 

100 

1 

0-100 

40 

1.24 

1.70 

0 

12 

100 

1 

0-100 

50 

1.45 

2.00 

0 

13 

100 

1 

0-500 

10 

23.33 

22.2 

0 

91 

100 

1 

0-500 

20 

14.28 

13.87 

0 

58 

100 

1 

0-500 

30 

10.24 

11.20 

0 

67 

100 

1 

0-500 

40 

8.39 

9.60 

0 

51 

1 

0-500 

50 

7.15 

7.87 

0 

45 
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Table  6.7.  Results  for  the  TA  GHC  algorithm,  where  =  300 ,  and  =  0.950^. . 


Outer 

loop 

K 

Inner 

loop 

M 

Objective 

function 

values 

Neighbor¬ 
hood  size 

Mean 

objective 

function 

value 

Standard 

deviation 

Minimum 

objective 

function 

value 

Maximiun 

objective 

function 

value 

100 

10 

0-100 

10 

0.83 

1.17 

0 

5 

100 

10 

0-100 

20 

0.37 

0.77 

0 

5 

100 

10 

0-100 

30 

0.15 

0.41 

0 

2 

100 

10 

0-100 

40 

0.01 

0.10 

0 

1 

100 

10 

0-100 

50 

0.06 

0.28 

0 

2 

100 

10 

0-500 

10 

7.72 

7.60 

0 

28 

100 

10 

0-500 

20 

5.58 

6.03 

0 

29 

100 

10 

0-500 

30 

3.91 

4.50 

0 

20 

100 

10 

0-500 

40 

3.11 

3.55 

0 

16 

100 

10 

0-500 

50 

1.97 

2.32 

0 

11 

100 

5 

0-100 

10 

1.31 

1.77 

0 

9 

100 

5 

0-100 

20 

0.62 

1.03 

0 

5 

100 

5 

0-100 

30 

0.30 

0.61 

0 

3 

100 

5 

0-100 

40 

0.17 

0.47 

0 

2 

100 

5 

0-100 

50 

0.16 

0.42 

0 

2 

100 

5 

0-500 

10 

12.71 

13.38 

0 

75 

100 

5 

0-500 

20 

7.46 

7.07 

0 

28 

100 

5 

0-500 

30 

5.10 

6.17 

0 

47 

100 

5 

0-500 

40 

3.78 

3.96 

0 

18 

100 

5 

0-500 

50 

3.79 

4.63 

0 

20 

100 

1 

0-100 

10 

3.04 

3.33 

0 

16 

100 

1 

0-100 

20 

2.02 

2.71 

0 

14 

100 

1 

0-100 

30 

1.37 

1.76 

0 

7 

100 

1 

0-100 

40 

1.07 

1.43 

0 

7 

100 

1 

0-100 

50 

1.07 

1.47 

0 

8 

100 

1 

0-500 

10 

20.71 

19.86 

0 

78 

100 

1 

0-500 

20 

12.51 

13.26 

0 

78 

100 

1 

0-500 

30 

9.04 

8.72 

0 

47 

100 

1 

0-500 

40 

8.39 

9.28 

0 

47 

100 

1 

0-500 

50 

7.50 

7.34 

0 

34 
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Table  6.8.  Results  for  the  Weibull  GHC  algorithm,  where 
=  300,  =  0.954,  “  =  0.25 . 


Outer 

loop 

K 

Inner 

loop 

M 

Objective 

function 

values 

Neighbor¬ 
hood  size 

Mean 

objective 

function 

value 

Standard 

deviation 

Minimiun 

objective 

function 

value 

Maximum 

objective 

function 

value 

100 

10 

0-100 

10 

0.69 

1.01 

0 

5 

100 

10 

0-100 

20 

0.41 

0.70 

0 

3 

100 

10 

0-100 

30 

0.18 

0.48 

0 

2 

100 

10 

0-100 

40 

0.10 

0.36 

0 

2 

100 

10 

0-100 

50 

0.08 

0.31 

0 

2 

100 

10 

0-500 

10 

6.57 

7.55 

0 

53 

100 

10 

0-500 

20 

3.44 

3.60 

0 

17 

100 

10 

0-500 

30 

2.59 

3.23 

0 

20 

100 

10 

0-500 

40 

1.90 

2.32 

0 

16 

100 

10 

0-500 

50 

1.50 

1.91 

0 

10 

100 

5 

0-100 

10 

1.07 

1.49 

0 

9 

100 

5 

0-100 

20 

0.46 

0.74 

0 

3 

100 

5 

0-100 

30 

0.24 

0.55 

0 

3 

100 

5 

0-100 

40 

0.20 

0.47 

0 

2 

100 

5 

0-100 

50 

0.11 

0.35 

0 

2 

100 

5 

0-500 

10 

8.89 

10.38 

0 

62 

100 

5 

0-500 

20 

5.06 

6.90 

0 

47 

100 

5 

0-500 

30 

4.14 

4.44 

0 

23 

100 

5 

0-500 

40 

3.05 

3.74 

0 

16 

100 

5 

0-500 

50 

2.45 

3.16 

0 

15 

100 

1 

0-100 

10 

2.82 

3.22 

0 

14 

100 

1 

0-100 

20 

1.85 

2.63 

0 

13 

100 

1 

0-100 

30 

1.22 

1.51 

0 

9 

100 

1 

0-100 

40 

1.21 

1.82 

0 

10 

100 

1 

0-100 

50 

1.09 

1.75 

0 

10 

100 

1 

0-500 

10 

16.78 

19.37 

0 

141 

100 

1 

0-500 

20 

10.92 

11.38 

0 

53 

100 

1 

0-500 

30 

10.03 

11.33 

0 

58 

100 

1 

0-500 

40 

7.37 

7.79 

0 

43 

100 

1 

0-500 

50 

6.64 

6.97 

0 

36 

104 


Table  6.9.  Results  for  the  local  search  GHC  algorithm. 


Outer 

loop 

K 

Inner 

loop 

M 

Objective 

function 

values 

Neighbor¬ 
hood  size 

Mean 

objective 

function 

value 

Standard 

deviation 

Minimiun 

objective 

function 

value 

Maximxun 

objective 

function 

value 

1 

1000 

0-100 

10 

6.28 

6.22 

0 

27 

1 

1000 

0-100 

20 

3.29 

3.71 

0 

16 

1 

1000 

0-100 

30 

2.38 

2.81 

0 

13 

1 

1000 

0-100 

40 

1.59 

2.17 

0 

10 

1 

1000 

0-100 

50 

0.95 

1.25 

0 

6 

1 

1000 

0-500 

10 

36.88 

37.34 

0 

188 

1 

1000 

0-500 

20 

21.01 

23.93 

0 

156 

1 

1000 

0-500 

30 

11.36 

11.62 

0 

54 

1 

1000 

0-500 

40 

9.37 

9.11 

0 

50 

1 

1000 

0-500 

50 

7.58 

9.52 

0 

47 

1 

500 

0-100 

10 

6.28 

6.22 

0 

27 

1 

500 

0-100 

20 

3.29 

3.71 

0 

16 

1 

500 

0-100 

30 

2.38 

2.81 

0 

13 

1 

500 

0-100 

40 

1.59 

2.17 

0 

10 

1 

500 

0-100 

50 

0.95 

1.25 

0 

6 

1 

500 

0-500 

10 

36.88 

37.34 

0 

188 

1 

500 

0-500 

20 

21.01 

23.93 

0 

156 

1 

500 

0-500 

30 

11.36 

11.62 

0 

54 

1 

500 

0-500 

40 

9.37 

9.11 

0 

50 

1 

500 

0-500 

50 

7.58 

9.52 

0 

47 

1 

100 

0-100 

10 

6.28 

6.22 

0 

27 

1 

100 

0-100 

20 

3.29 

3.71 

0 

16 

1 

100 

0-100 

30 

2.38 

2.81 

0 

13 

1 

100 

0-100 

40 

1.59 

2.17 

0 

10 

1 

100 

0-100 

50 

0.95 

1.25 

0 

6 

1 

100 

0-500 

10 

36.88 

37.34 

0 

188 

1 

100 

0-500 

20 

21.01 

23.93 

0 

156 

1 

100 

0-500 

30 

11.36 

11.62 

0 

54 

1 

100 

0-500 

40 

9.37 

9.11 

0 

50 

1 

100 

0-500 

50 

7.58 

9.52 

0 

47 
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Table  6. 10.  Results  for  the  Monte  Carlo  GHC  algorithm. 


Outer 

loop 

K 

Inner 

loop 

M 

Objective 

function 

values 

Neighbor¬ 
hood  size 

Mean 

objective 

function 

value 

Standard 

deviation 

Minimum 

objective 

function 

value 

Maximum 

objective 

function 

value 

1 

1000 

0-100 

10 

0.37 

0.72 

0 

3 

1 

1000 

0-100 

20 

0.07 

0.33 

0 

2 

1 

1000 

0-100 

30 

0.08 

0.34 

0 

2 

1 

1000 

0-100 

40 

0.03 

0.22 

0 

2 

1 

1000 

0-100 

50 

0.01 

0.10 

0 

1 

1 

1000 

0-500 

10 

3.61 

4.5 

0 

22 

1 

1000 

0-500 

20 

1.72 

2.16 

0 

10 

1 

1000 

0-500 

30 

1.21 

1.65 

0 

8 

1 

1000 

0-500 

40 

0.93 

1.3 

0 

6 

1 

1000 

0-500 

50 

0.86 

1.33 

0 

6 

1 

500 

0-100 

10 

0.81 

1.13 

0 

5 

1 

500 

0-100 

20 

0.39 

0.76 

0 

4 

1 

500 

0-100 

30 

0.17 

0.43 

0 

2 

1 

500 

0-100 

40 

0.08 

0.31 

0 

2 

1 

500 

0-100 

50 

0.04 

0.24 

0 

2 

1 

500 

0-500 

10 

4.96 

5.59 

0 

27 

1 

500 

0-500 

20 

3.47 

4.51 

0 

20 

1 

500 

0-500 

30 

2.28 

2.94 

0 

17 

1 

500 

0-500 

40 

2.25 

2.87 

0 

20 

1 

500 

0-500 

50 

1.67 

2.06 

0 

10 

1 

100 

0-100 

10 

2.20 

2.85 

0 

16 

1 

100 

0-100 

20 

1.62 

2.26 

0 

14 

1 

100 

0-100 

30 

1.23 

2.04 

0 

14 

1 

100 

0-100 

40 

0.87 

1.30 

0 

7 

1 

100 

0-100 

50 

1.01 

1.47 

0 

8 

1 

100 

0-500 

10 

12.06 

13.75 

0 

93 

1 

100 

0-500 

20 

8.63 

8.58 

0 

43 

1 

100 

0-500 

30 

7.0 

7.32 

0 

35 

1 

100 

0-500 

40 

6.71 

6.12 

0 

27 

1 

100 

0-500 

50 

6.29 

6.23 

0 

29 
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One  explanation  for  the  success  of  the  Monte  Carlo  GHC  formulation  is  that  the 
five  thousand  objective  function  values  in  each  problem  were  generated  uniformly 
randomly  between  their  minimum  and  maximum  allowable  values.  Therefore  a  global 
maximum  was  as  likely  to  be  a  neighbor  of  a  global  minimum  as  any  other  solution,  and  so 
the  objective  function  topology  was  highly  irregular,  with  small  or  nonexistent  regions  of 
attraction  surrounding  local  minima.  Another  explanation  is  that  for  each  experiment,  the 
GHC  algorithm  had  the  opportunity  to  search  a  significant  proportion  of  the  solution 
space  (between  2%  and  20%,  depending  on  the  number  of  iterations  performed).  These 
percentages  are  much  higher  than  would  occur  if  the  GHC  algorithm  was  executed  on  a 
more  typical  discrete  optimization  problem,  and  thus  would  also  favor  Monte  Carlo 
search.  Research  is  needed  to  assess  how  the  five  GHC  formulations  would  perform  on  a 
smoother  objective  function  topology,  such  as  a  discretized  version  of  the  problem 
suggested  by  Bohachevsky,  Johnson,  and  Stein  [1986,  Figure  1]. 

6.3  Air  Force  Manufacturing  Process  Design  Problem 

The  research  goal  of  this  section  was  to  examine  how  the  five  selected  GHC 
formulations  perform  on  an  Air  Force  manufacturing  process  design  problem.  The  Air 
Force  wishes  to  develop  manufacturing  process  design  tools  that  optimize  the  selection, 
input  parameters,  and  sequencing  of  operations  for  material  shaping  and  microstructure 
treatment.  As  an  initial  application,  the  Air  Force  is  planning  to  use  the  tools  to  design  the 
manufacturing  process  for  an  aircraft  turbine  engine  compressor  rotor.  Traditional 
production  techniques  use  empirical  rules  to  optimize  individual  production  operations 
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(e.g.,  forging,  machining,  and  heat  treatment),  and  sequence  the  individual  operations 
based  on  experience  or  industry  tradition.  In  contrast,  the  Air  Force  wishes  to  optimize 
the  overall  manufacturing  process,  in  order  to  achieve  the  best  balance  of  manufacturing 
cost,  producibility,  and  final  product  properties. 

A  principal  challenge  to  the  Air  Force  approach  is  that  the  size  of  the  solution 
space  Q  can  become  extremely  large  as  the  number  of  potential  process  operations, 
operation  parameters,  and  operation  parameter  values  (discretized,  if  necessary)  are 
increased.  (The  solution  space  cardinality  of  the  problem  used  for  the  computations  in  this 
section  is  approximately  10^°.)  In  addition,  Sullivan  [1996]  describes  the  difBculties  of 
developing  a  measure  (e.g.,  an  objective  function)  that  accurately  captures  the  process 
stability,  product  properties,  and  total  cost. 

With  the  assistance  of  Wright  Laboratory  and  Ohio  University  personnel,  Sullivan 
[1996]  analyzes  three  candidate  process  designs  for  the  Air  Force  initial  application.  This 
research  selects  one  of  the  designs— an  upset  forge,  rough  machine,  and  finish  machine 
sequence,  and  uses  the  design  to  compare  the  respective  performance  of  the  five  GHC 
algorithm  formulations.  The  upset  forge,  rough  machine,  and  finish  machine  models  were 
developed  and  coded  in  C  by  researchers  at  Ohio  University  (Sullivan  [1996]).  The 
objective  function  and  process  feasibility  criteria  were  developed  by  a  team  of  Ohio 
University,  Wright  Laboratory,  and  Virginia  Tech  researchers.  (Two  causes  of  process 
infeasibility  are  the  violation  of  a  physical  constraint  (e.g.,  exceeding  a  forge  press 
capacity),  or  the  violation  of  a  material  property  (e.g.,  material  cracking  from  excessive 


deformation  in  a  forging  operation).)  Finally,  a  neighborhood  was  defined  by  randomly 
selecting  one  of  the  process  operations,  then  randomly  selecting  an  operation  parameter, 
and  finally  randomly  perturbing  the  parameter’s  value. 

6.3.1  Experimental  Design 

An  experiment  was  specified  by  selecting  a  set  of  inputs  for  the  forge  and  machine 
model  parameters,  a  GHC  algorithm  (/?t(/,y))  formulation,  and  a  corresponding  set  of 
inputs  describing  the  number  of  iterations  and  the  Rjiii,j)  parameter’s  behavior.  An 
experiment  was  conducted  by  executing  the  GHC  algorithm  formulation  on  thirty  problem 
instances.  Each  problem  instance  was  generated  by  randomly  selecting  a  set  of  inputs  for 
the  model  parameters.  The  same  seeds  were  used  to  generate  the  problem  instances  for  all 
experiments.  All  experiments  were  replicated  twice,  first  for  ten  thousand  iterations  (per 
problem  instance),  and  then  for  one  hundred  thousand  iterations  (per  problem  instance). 

For  each  problem  instance,  the  solution  with  the  minimum  objective  function  value 
found  by  the  GHC  algorithm  was  recorded,  and  the  solution’s  feasibility  was  noted.  For 
each  experiment,  the  mean  minimum  objective  function  value,  the  standard  deviation,  and 
maximum  and  minimum  objective  fUnction  values,  were  computed  from  the  set  of  feasible 
final  solutions  obtained  from  the  thirty  problem  instances. 

6.3.2  Results 

AU  final  solutions  were  feasible  in  each  experiment,  hence  all  results  are  averaged 
over  thirty  trials.  The  SUN  Ultra- 1  computer  required  an  average  of  30  cpu  minutes  for 
each  ten-thousand  iteration  experiment,  and  an  average  of  310  cpu  minutes  for  each  one- 
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hundred-thousand  iteration  experiment.  (This  is  why  only  thirty  trials  were  conducted  per 
experiment,  versus  one  hundred  trials  for  the  problems  m  Sections  6.1  and  6.2.)  The 
results  are  shown  in  Tables  6. 1 1  -  6. 14. 

SA  was  able  to  find  better  results  (lower  mean  objective  function  values)  than  TA 
for  the  ten-thousand  iteration  computations.  The  performance  gap  was  most  pronounced 
for  experiments  in  which  the  (j,j)  parameter  was  very  slowly  decreased  (e.g.,  for  the 
K  -  500,  M  =  20  case  of  Tables  6.11  and  6.12).  SA  and  TA  performed  about  the  same 
for  the  one  hundred-thousand  iteration  computations.  The  Weibull  GHC  formulation 
(Table  6.13)  provided  the  best  overall  performance  for  the  Air  Force  process  design 
problem,  for  the  experiments  in  which  the  shape  parameter  alpha  was  strictly  less  than 
one. 


Table  6.11.  Results  for  the  S  A  GHC  algorithm. 


where  /q  =  10000,  and  =<})/*. 


Outer 

loop 

K 

Inner 

loop 

M 

Decre¬ 

ment 

<!> 

Mean 

objective 

function 

value 

Standard 

deviation 

Minimum 

objective 

function 

value 

Maximum 

objective 

function 

value 

500 

200 

.99 

1645.21 

8.52 

1618.89 

1659.44 

200 

500 

.95 

1655.78 

31.56 

1627.97 

1745.56 

200 

500 

.90 

1700.38 

44.20 

1634.59 

1754.67 

100 

1000 

.85 

1688.28 

43.03 

1636.01 

1752.09 

500 

20 

.99 

1648.24 

7.88 

1624.55 

1659.44 

200 

50 

.95 

1708.38 

73.27 

1638.42 

1852.99 

200 

50 

.90 

1746.56 

73.09 

1624.55 

1826.80 

100 

100 

.85 

1763.14 

76.51 

1648.78 

1858.53 
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Table  6. 12.  Results  for  the  TA  GHC  algorithm. 


where  Qq  =  10000,  and  Qt+i  -^Qk- 
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value 
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objective 

flmction 

value 

500 

200 

.99 

1612  AS 

32.58 

1636.01 

1745.56 

200 

500 

.95 

1652.61 

3.09 

1636.20 

1653.18 

200 

500 

.90 

1699.96 

48.26 

1630.35 

1745.56 

100 

1000 

.85 

1707.12 

46.75 

1624.55 

1745.56 

500 

20 

.99 

\1422A 

40.30 

1636.20 

1819.10 

200 

50 

.95 

1757.43 

59.71 

1624.55 

1819.10 

200 

50 

.90 

1784.11 

57.14 

1636.01 

1852.99 

100 

100 

.85 

1768.48 

67.71 

1624.55 

1845.64 

Table  6.13.  Results  for  the  Weibull  GHC  algorithm,  where  =  10000,  and  =<!)/*. 
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Mean 
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function 

value 
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function 

value 

Maximum 

objective 

function 

value 

500 

200 

0.25 

.99 

1630.32 

7.42 

1620.69 

1647.47 

500 

200 

0.50 

.99 

1630.32 

7.04 

1624.55 

1658.41 

500 

200 

1.25 

.99 

1646.65 

7.85 

1636.01 

1661.98 

500 

200 

0.25 

.95 

1648.29 

23.10 

1636.01 

1745.56 

500 

200 

0.50 

.95 

1675.45 

49.58 

1624.55 

1790.29 

500 

200 

1.25 

.95 

1712.97 

56.94 

1638.42 

1819.10 

500 

20 

0.25 

.99 

1627.94 

27.04 

1492.15 

1661.27 

500 

20 

0.50 

.99 

1643.35 

7.44 

1626.21 

1658.82 

500 

20 

1.25 

.99 

1665.02 

44.44 

1624.55 

1843.65 

500 

20 

1.5 

.99 

1669.46 

44.81 

1633.39 

1772.38 

500 

20 

2.0 

.99 

1704.08 

58.02 

1632.83 

1772.38 

500 

20 

4.0 

.99 

1701.91 

53.64 

1633.39 

1772.38 

500 

20 

0.25 

.95 

1656.82 

70.44 

1492.15 

1828.50 

500 

20 

0.5 

.95 

1731.97 

70.88 

1641.84 

1858.44 

500 

20 

1.25 

.95 

1768.00 

94.50 

1638.56 

2012.66 
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Table  6.14.  Results  for  the  local  search  and  Monte  Carlo  GHC  algorithms. 
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Mean 

objective 

function 

value 
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deviation 

Minimum 

objective 

function 

value 

Maximum 

objective 

function 

value 

Oocal  search) 

1 

100000 

0 

1757.29 

133.88 

1616.82 

2241.37 

1 

10000 

0 

1796.54 

110.56 

1624.55 

2241.37 

(Monte  Carlo  search) 

1 

100000 

le09 

1855.14 

65.97 

1507.71 

1868.41 

1 

10000 

le09 

1852.41 

65.95 

1507.71 

1868.41 

Indeed,  the  ^  =  0.99 ,  a  =  0.25  experiments  reported  the  lowest  average  objective 
function  value  of  all  experiments  conducted.  Note  that  this  ((j),a)  choice  represents  the 
slowest  decreasing  parameter  schedule  of  all  experiments  conducted  for  the  Air 

Force  problem.  Finally,  neither  the  local  search  nor  the  Monte  Carlo  GHC  formulations 
performed  as  well  as  any  of  the  hill  climbing  GHC  algorithms. 

6.4  Summary 

The  FASD,  the  generic  configuration  space,  and  the  Air  Force  manufacturing 
design  problem  experiments  suggest  that  relationships  do  exist  between  the  selected  GHC 
algorithm  formulations  and  finite-time  GHC  algorithm  performance.  The  Monte  Carlo 
formulation  performed  very  well  on  the  generic  configuration  space  problem,  yet 
performed  poorly  on  the  other  two  problems.  For  the  FASD  problem,  TA  was  much 
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more  sensitive  to  parameter  tuning  than  was  SA.  The  Weibull  formulation  performed 
better  for  the  Air  Force  and  generic  configuration  space  problems  when  the  shape 
parameter  a  was  strictly  less  than  one;  conversely,  a  values  strictly  greater  than  one 
worked  better  for  the  FASD  problem.  Additional  research  is  necessary  to  more  explicitly 
quantify  how  each  GHC  algorithm  formulation  performs  on  the  three  problems,  and  to 
examine  additional  GHC  algorithm  formulations,  such  as  a  gamma  random  variable  or  a 
binomial  random  variable. 
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CHAPTER?:  FINALE 


Many  discrete  optimization  (minimization)  problems  are  difficult  to  solve,  i.e.,  are 
NP-complete.  There  are  no  known  polynomial-time  algorithms  that  can  solve  NP- 
complete  problems.  However,  since  NP-complete  problems  contain  many  examples  of 
practical  interest,  heuristic  methods  have  been  developed  that  efficiently  find  near-optimal 
solutions.  Heuristic  methods  can  be  grouped  into  two  conceptual  classes:  a  class  that 
computes  the  best  solution  constructively  starting  from  raw  data,  and  a  class  that 
iteratively  improves  upon  an  existing  solution.  Stochastic  hill  climbing  algorithms  are 
iterative  in  nature,  and  have  the  ability  to  probabilistically  accept  candidate  solutions  with 
higher  cost  than  that  of  the  incumbent  solution,  in  an  effort  to  escape  local  optima. 

7.1  Conclusions  and  Extensions 

This  dissertation  introduces  generalized  hill  climbing  (GHC)  algorithms,  which 
include  several  existing  discrete  optimization  search  algorithms  such  as  simulated 
annealing  (SA),  local  search,  and  threshold  accepting  (TA)  as  particular  formulations.  The 
contributions  from  this  research  focus  on  two  areas:  first,  a  new  method  of  proving 
asymptotic  convergence  of  stochastic  hill  climbing  algorithms  is  presented,  that  relaxes  the 
sufficient  conditions  found  in  the  literature.  This  result  creates  a  large  body  of  convergent 
stochastic  hill  climbing  algorithms,  where  only  SA  existed  previously.  Second,  empirical 
tests  of  the  performance  of  various  GHC  formulations  are  conducted  on  specific  problem 
classes.  These  tests  study  which  probability  distributions  enhance  the  GHC  algorithm’s 
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finite-time  performance  (in  terms  of  solution  quality  versus  algorithm  execution  time)  on 
three  selected  problems,  including  a  flexible  assembly  system  design  (FASD)  problem  (a 
Hamiltonian  path  problem),  a  generic  configuration  space  problem,  and  an  Air  Force 
manufacturing  process  design  problem. 

7.1.1  Knowledge  of  the  Optimal  Objective  Function  Value 

For  some  discrete  optimization  problems,  the  globally  optimal  objective  function 
value  is  known,  and  the  goal  is  to  identify  an  associated  optimal  solution.  Theorems  3.1 
and  3.2  show  that  when  is  known,  then  any  acceptance  distribution  fimction  that 

satisfies  (3.1),  Theorem  3.1  (c),  and  (3.6)  is  sufficient  for  asymptotic  convergence  to  the 
set  of  globally  optimal  solutions.  Therefore,  the  class  of  GHC  algorithms  presented  in 
Chapter  3,  provides  flexibility  in  selecting  the  acceptance  distribution  while  maintaining 
convergence  properties  to  the  set  of  globally  optimal  solutions. 

One  example  is  the  discrete  event  simulation  problem  ACCESSIBILITY  discussed 
by  Yucesan  and  Jacobson  [1996],  where  the  goal  is  to  identify  a  sequence  of  events  (a 
solution)  that  enables  a  simulation  model  to  terminate  in  a  known  state;  the  objective 
function  value  of  this  state  is  defined  to  be  zero.  For  problems  where  the  optimal 
objective  function  value  is  not  known,  an  estimate  of  is  sufficient  to  execute  the  GHC 

algorithm  class  defined  in  Chapter  3.  The  GHC  algorithm  is  initiated  using  the 

estimate,  and  the  estimate  is  updated  as  the  algorithm  progresses— see  Bohachevsky  et  al. 
[1986].  This  approach  is  being  used  to  optimize  an  Air  Force  manufacturing  process 
design  problem  (Walker  [1992]). 
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7.1.2  Convergence  Using  Linear  Algebra 

The  theorems  in  Chapter  4  provide  a  proof  of  convergence  for  a  general  GHC 
algorithm,  that  does  not  require  knowledge  of  the  optimal  objective  function  value  nor  the 
reversibility  condition.  The  proof  is  based  on  a  perturbation  theory  developed  for 
problems  in  linear  algebra.  However,  these  theorems  imply  that  in  general,  it  is  very 
difficult  to  prove  asymptotic  convergence  of  a  stochastic  hill  climbing  algorithm  by  using 
linear  algebra  techniques,  unless  very  stringent  conditions  (e.g.,  asymptotic  boundedness 
of  the  norm  of  a  transition  matrix  generalized  inverse,  or  asymptotic  nonsingularity  of  the 
transition  matrix),  are  met. 

7.1.3  General  Proof  of  Convergence 

Theorem  5.2’s  proof  of  convergence  of  the  GHC  algorithm,  provides  the  most 
significant  theoretical  contribution  of  this  dissertation.  The  sufficient  conditions  of 
Theorem  5.2  represent  a  relaxation  of  the  most  general  sufficient  conditions  found  in  the 
literature  (Anily  and  Federgruen,  [1987]).  The  principal  shortcoming  of  Theorem  5.2  is 
that  in  practice,  the  conditions  would  be  difficult  to  check  for  problems  with  large  solution 
space  cardinalities.  Two  current  research  issues  are  to  determine  if  the  conditions  of 
Theorem  5.2  are  also  necessary,  and  if  these  conditions  can  be  reformulated  or  fiirther 
relaxed,  in  order  to  make  them  easier  to  verify.  This  is  further  discussed  in  Section  7.2. 

7.1.4  Computational  Results 

Experiments  on  the  FASD  problem,  the  generic  configuration  space  problem,  and 
the  Air  Force  manufacturing  problem,  suggest  that  relationships  do  exist  between  the 
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selected  GHC  algorithm  formulations  and  finite-time  GHC  algorithm  performance.  SA 
performed  well  on  all  three  problems,  and  its  performance  was  relatively  insensitive  to 
changes  in  its  parameter  values.  The  Monte  Carlo  formulation  performed  very  well  on  the 
generic  configuration  space  problem,  yet  performed  poorly  on  the  other  two.  For  the 
FASD  problem,  TA  was  much  more  sensitive  to  parameter  tuning  than  was  SA.  The 
Weibull  formulation  performed  better  for  the  Air  Force  and  generic  configuration  space 
problems  when  the  shape  parameter  a  was  strictly  less  than  one;  conversely,  a  values 
strictly  greater  than  one  worked  better  for  the  FASD  problem.  (Note  that  when  a  is  equal 
to  one,  then  the  Weibull  formulation  is  equivalent  to  SA). 

Additional  research  is  necessary  to  assess  how  the  five  GHC  algorithm  formulations 
perform  on  generic  configuration  spaces  that  exhibit  more  structure  than  the  randomly 
generated  configuration  spaces  used  for  this  dissertation.  This  research  would  determine 
which  GHC  algorithm  formulation  and  what  neighborhood  definition  would  provide  the 
best  finite-time  performance  on  each  type  of  generic  configuration  space.  Additional 
research  would  then  extrapolate  the  results  to  problem  classes  in  the  literature  possessing 
characteristics  similar  to  those  of  the  generic  configuration  space  problems.  The  goal  is  to 
provide  the  practitioner  with  specific  recommendations  for  a  GHC  formulation,  associated 
parameter  values  (subject  to  scaling),  and  a  neighborhood  definition,  that  would  give  the 
best  finite-time  performance  for  a  given  discrete  optimization  problem. 
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7.2  Additional  Extensions 


A  key  limitation  to  the  theorems  of  Chapters  4  and  5  is  the  difficulty  of  checking  the 
sufficient  conditions  for  problems  with  large  solution  space  cardinalities.  One  way  to 
address  this  difficulty  would  be  to  develop  a  way  to  scale  a  small  (Q,5\r,c)  problem 
(where  the  conditions  can  be  easily  checked)  to  a  similar,  but  much  larger-sized  problem. 
Then,  an  algorithm  with  known  asymptotic  convergence  properties  for  the  small  problem 
could  be  presumed  to  exhibit  similar  convergence  properties  for  the  larger  problem.  With 
this  approach,  one  could  derive  bounds  for  the  matrix  norms  of  Chapter  4,  or  bounds  for 
the  equilibrium  probabilities  5^  (^) ,  y  e  Z, ,  for  the  Theorems  of  Chapter  5. 

An  implication  fi-om  Lemma  4.2  is  that,  for  a  given  GHC  algorithm  formulation,  a 
relationship  exists  between  the  number  of  local  optima  and  the  rate  that  the  determinant  of 
the  matrix  Q{k)  approaches  zero.  If  this  relationship  could  be  quantified,  the  result  may  be 
useful  for  developing  a  bound  for  the  rate  at  which  hill  climbing  transition  probabilities 
must  approach  zero. 

The  theorems  of  Chapter  5  lend  themselves  to  extensions  in  several  key  areas.  For 
the  first  extension,  recall  that  the  Theorem  5.2  conditions  (e)  -  (g)  require  that  every  path 
probability  (for  solutions  in  (L  u  G)  )  be  checked.  Instead,  condition  (e)  can  be  relaxed  to 
consider  the  sum  (over  k)  of  the  probabilities  of  only  the  most  difficult  path  (i.e.,  the  path 
of  lowest  probability  fi-om  any  solution  j  eL  to  any  solution  /  g (l u G) ).  Similarly, 
conditions  (f)  and  (g)  can  be  relaxed  to  consider  only  the  maximal  path  probability  (i.e., 
the  easiest  path),  instead  of  considering  the  probabilities  of  all  paths. 
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A  second  extension  is  to  derive  necessary  conditions  for  Theorem  5.2.  If 
conditions  (e)  -  (g)  should  prove  to  be  both  sufficient  and  necessary,  then  Theorem  5.2 
would  entirely  subsume  Anily  and  Federgruen’s  [1987]  result. 

A  final  extension  is  to  relax  the  need  to  know  the  equilibrium  probabilities  5  j(k), 

for  all  j  &  L  and  all  k.  However,  if  this  relaxation  is  infeasible,  then  an  analysis  of 
conditions  (e)  and  (g)  may  provide  implications  on  the  rate  that  6y(Ar)  must  approach 
zero  as  k grows  large,  for  all  j  gL.  This  in  turn  would  provide  a  link  between  finite-time 
performance  and  convergence  rates. 

Research  is  needed  to  assess  the  best  neighborhood  size  for  a  given  problem. 
Equations  (5.1)  and  (5.2)  define  path  probabilities  in  terms  of  the  product  of  transition 
probabilities  between  sets  of  neighboring  solutions.  Research  is  needed  to  assess  whether 
path  probabilities  and  neighborhood  size  are  related,  and  whether  a  bound  on  the  best 
neighborhood  size  could  be  derived  fi'om  these  equations  and  Theorem  5.2. 

A  final  research  topic  is  to  show  how  the  tabu  search  (TS)  heuristic  (Glover 
[1994])  can  be  formulated  as  a  GHC  algorithm,  and  to  derive  conditions  under  which  TS 
will  asymptotically  converge  to  the  set  of  optimal  solutions.  Note  that  probabilistic  TS 
can  be  formulated  to  resemble  SA  (see,  e.g.,  Faigle  and  Kern  [1992]).  Since  SA  is  a  GHC 
algorithm,  hence  the  probabilistic  TS  formulation  is  a  GHC  algorithm.  The  principal 
challenge  is  to  show  that  the  most  general  characteristics  of  TS,  including  short  term 
memory,  long  term  memory,  and  aspiration  criteria,  all  fall  within  the  GHC  rubric.  If  so, 
then  TS  could  claim  the  same  asymptotic  optimality  distinction  enjoyed  by  SA. 
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